Prematurely optimising for cloud bill in serverless apps

The pay-per-use pricing model of serverless is one of its great benefits. Developers are now aware of the costs of the cloud services they’re consuming and can make architectural decisions and write code based on this to deliver operationally and financially efficient applications.

But this pricing model can also be quite a curse. The awareness of the unit-cost of a specific service or API call can cause developers to over-optimise for cloud billing cost at the expense of a more complex codebase and possibly extra architectural moving parts. Which will of course have its own future cost of increased engineering hours supporting it.

An example of this is a question I posed on Twitter earlier today asking:

“is using TransactWriteItems a better default approach for writing a load of items to DynamoDB than using BatchWriteItem, even if you don’t need the atomicity & isolation that the former provides?”

I haven’t totally settled on my conclusion for this yet as it could get quite use case-specific (detailed notes here if you’re interested). But my reason for highlighting this is that both operations can perform the same functional operation for a common use case (writing multiple items to DynamoDB in a single request), but TransactWriteItems is twice as expensive as BatchWriteItem. However, crucially with BatchWriteItem, the onus is on the developer to handle the partial failure case where some writes succeed and some fail. This adds significant overhead to your code, potentially needing to add an SQS queue and a separate Lambda function to process the retries asynchronously. This solution would still likely be cheaper in terms of cloud bill than using TransactWriteItems (since write errors would be very infrequent), but you now have more code and cloud resources to maintain and monitor.

I’ll try to generalise this. Say you’re faced with two solution approaches A and B for a specific use case and cloud cost and implementation complexity are the two biggest factors influencing you either way. Solution A involves less code but will be more expensive in terms of cloud bill and B is more complex in terms of code and architecture but will produce a cheaper cloud bill. How do you decide?

Sometimes the decision will be clear without further research if there’s a huge differential in one of the two factors. But if you can’t predict what usage levels to expect or your predicted cloud billing savings aren’t significant, then I would default to option A, and optimise for less code and moving parts. Assuming you’re monitoring your cloud bill, you still have the option of optimising later if you notice costs getting out of hand in a certain area. But if you start with the heavy-code complexity option B, you start paying for that code overhead immediately, since engineers are your biggest cost.

Join daily email list

I publish short emails like this on building software with serverless on a daily-ish basis. They’re casual, easy to digest, and sometimes thought-provoking. If daily is too much, you can also join my less frequent newsletter to get updates on new longer-form articles.

    View Emails Archive

    Free Intro Call

    Book a free 30-minute introduction call with me to see how we could work together.

    Select a time for our call

    🪲 Testing Audit

    Are bugs in production slowing you down and killing confidence in your product?

    Get a tailored plan of action for overhauling your AWS serverless app’s tests and empower your team to ship faster with confidence.

    Learn more >>