How can I reduce my AWS bill!??
AWS cost optimisation was the biggest challenge that me and my team had faced in the last 2 months.
We saw a huge spike in our AWS billing for the month of May 2020 which was close to $10k.
Since we are a startup, optimising costs and resources becomes very important. We had to deep dive into the infra costing to figure out the loop holes and to optimise the spends on priority. Our product has around 20–25 micro services running with few other components like Redis, Kafka and MongoDB on AWS cloud.
We started to monitor costs in great detail and spent a good amount of time every day in understanding the breakup of the costs.We started with AWS’s own monitoring tools:
- Billing Dashboard, gives a high level view of your main costs (Amazon S3, Amazon EC2, etc.) and a rarely accurate forecast, at least for us.
- Detailed Billing Report, this feature has to be enabled in your account preferences. It sends you a daily gzipped .csv file containing one line per billable item since the beginning of the month.
- Cost Explorer, an interesting tool which can be used to quickly identify trends but it is still limited as you cannot build complete dashboards with it. It is mainly a reporting tool.
Step 1: Verified the current month’s costs breakdown and compared it with the previous month’s costs to analyse what went wrong.
Our account had 35TB of Data Processes by NAT Gateways and consumption of over 144TB of data under “regional data transfer — in/out/between EC2 AZ’s or using elastic IP’s or ELB” and it had cost us more than $1k.
We quickly understood that the Data Transfer costs and NAT price were the culprits. We needed more information around this to optimise these costs.
Step 2: Use of AWS Cost Explorer
AWS cost explorer provides a clear picture to analyse the cost and usage data. It provided daily reports based on service, usage type etc. We started creating documents for daily usage of resources. This helped us in discovering opportunities for greater cost optimisations.
Step 3: Communicated with AWS support
As we were confused with few of the terminologies related to AWS billing, we had a conversation with AWS support team. This made our progress faster in figuring out where we were actually going wrong!
Step 4: Use AWS VPC Flow Logs to monitor the traffic coming in and going out from all the network interfaces
We enabled AWS VPC flow logs which helped in tracking bytes coming in and out to our VPC. We quickly figured out that few of our services that are communicating publicly were the reason for the huge increase in NAT pricing.
Now that we had the data, how did we optimise?
1 — Unused infrastructure
We eliminated the following unused services/resources:
- Detached Elastic IPs (EIPs), they are free when attached to an EC2 instance but you have to pay for it if they are not.
- The block stores (EBS) starting with the EC2 instances are preserved when you stop your instances. We cleaned up the unwanted snapshots.
- Additional Load Balancers (ELB), resorting to use single load balancer for multiple requests.
- Since, all our applications are dockerized, we moved containers running on multiple instances to ECS(elastic container service). This helped us in maintaining and scaling our docker containers which resulted in eliminating 15–20 instances.
2 — Optimise Amazon S3
The second source of optimisation was the object management on S3. Storage is cheap and infinite, but it is not a valid reason to keep all your data there forever. We cleaned up the S3 and implemented different policies on buckets based on the usage.
Finally, we also learnt that enabling a VPC Endpoint for your Amazon S3 buckets will suppress the data transfer costs between Amazon S3 and your instances.
3 — Leverage the Spot market
Spot instances enables you to use AWS’s spare computing power at a heavily discounted price.
Since spot instances are easy to use for non critical batch workloads and useful for data processing, it’s a very good match for Amazon Elastic Map Reduce. We deployed few services on EMR and ECS with spot instances.
4 — Data transfer
AWS not only charges for data transfer from the services to the Internet but also for transfers in-between AWS Availability Zones.
We observed that instances were running on multiple AZ.
- All our application/components communicate with each other. So we relocated them to the same AZ.
- Used managed services for Kafka as their inter-AZ replication costs comes in-built in their pricing.
Following multiple blogs and following advices from several Senior DevOps Engineers we could easily bring down the cost up to 40–45% from the initial spike that we saw in May 2020. Finally, implementing alerting system will aid even further in tracking your billing details.
Thank you for reading.