How This Company Saved Millions In DynamoDB Costs: A Case Study

How you can do the same by applying the lessons learned and avoid a common problem many DynamoDB users face.

Read on Beehiiv | Nov 3rd, 2024

Welcome to The Cloud Economist!

Last week I introduced and explained what AWS Budget Alerts are and how they are an invaluable tool to optimize costs, reduce them and be actively vigilant about costs.

The articles outlined key considerations including understanding your current usage, setting objectives, categorizing budgets, and establishing monitoring practices to effectively manage AWS costs.

I also elaborated on AWS Billing Alerts as a real-time cost management tool, offering several best practices for implementation while emphasizing how these alerts can be integrated with other AWS cost optimization services to prevent unexpected expenses.

If you missed that edition you can grab it here.

In last week’s question, I asked:

What are dummy alerts in AWS Billing alerts and how do they help you detect unintended usage (that can bring up unnecessary costs)?

Answer:

Dummy alerts have varying use cases but one valuable use case to maintain security and costs is setting a small budget amount on the services you’d like to monitor for unintended usage.

For example, if you don’t use Lambda in production, set a 1$ budget alert to catch if someone on your team accidentally deploys a function.

(naturally, you should use IAM to regulate access to services, but this protects against unintended usage).

This week, I bring you a fascinating case study about a company that encountered a million-dollar database problem and solved it with intelligent DynamoDB design and features.

Here are the best articles I’ve found on cloud cost savings this week, summarized.

The company Farshidoo worked with had 700 Billion records stored in their DynamoDB database. Much of the records contained information about inactive or irrelevant customer data. This unnecessary data was incurring the company very high costs.

By leveraging DynamoDB TTLs, the team was able to automate the deletion of records that no longer served business objectives.

However, the major challenge was efficiently scanning and removing older data without affecting live traffic to their application.

TLDR: the key takeaways include setting appropriate data retention (TTLs), scanning in segments to avoid throttling love traffic, and fine-tuning DynamoDB’s autoscaling for optimal read and write capacity.

Part 2 of the case study describes the architecture of the large-scale DynamoDB table scanner the team designed.

The team built a system to scan and clean up the massive database tables by splitting the work into two main parts:

  • "Readers" that check which data needs to be updated or deleted

  • "Updaters" that make those changes.

To handle such a huge amount of data efficiently, they created a system that could work on multiple parts of the database at once, while being careful not to disrupt the live traffic.

They also made sure the system could handle failures gracefully by keeping track of progress and being able to pause and resume operations when needed, storing the intermediate data in S3 files instead of using more expensive queuing systems.

The team built a flexible scanning system that could handle errors gracefully, including features like automatic retries for file uploads/downloads and an emergency stop button.

During the actual scanning process, which took 84 days for the largest table, they faced and solved several challenges including managing costs effectively by fine-tuning the speed between Readers and Updaters.

The project ultimately succeeded in reducing costs by removing inactive user data and proved valuable not just for its immediate goals but also for providing deeper insights into DynamoDB's capabilities and the importance of thorough testing for sensitive data operations.

The end result: Farshidoo’s company was able to reduce the storage costs of massive DynamoDB tables, kept operational costs low, and saved potentially millions of dollars.

One Tip on Cloud Cost Savings

The most important takeaway from this case study (in my opinion) is the importance of using TTLs.

Designing your database with TTLs (in DynamoDB especially) is one of the most valuable strategies for costs. (In fact, the entire story above could have been avoided with proper use of TTL!)

The advantage of this is that all data that is deleted through TTLs are free of charge. I have emphasized this before quite often and I constantly hear stories about stale data costing businesses thousands - in some cases millions of dollars.

Bottom Line: Use TTLs - they’re free!

This Week’s Question

What are some use cases where Scanning an entire database can remain efficient and low cost?

Check back here next week for the answer!

Until next week.

The Cloud Economist