Yesterday I was answering a question on the DynamoDBPlace forum related to a full text search use case where the existing implementation (DynamoDB) wasn’t a good fit, and the most optimised solution would involve introducing a new specialised service such as ElasticSearch or Algolia. However, in this person’s case, this more optimised solution may not prove to be the best one for them as it comes with its own costs and their existing solution (which “smells”—their words), may suffice.
This got me thinking more generally about how to break down non-trivial architectural decisions such as this into component parts to identify the costs involved in each option.
At a high-level, costs fall under two broad categories: engineering costs and cloud billing costs. Both are important to consider, but in my experience most cloud engineers tend to over-index on the (much easier to measure) billing cost when most often it’s the engineering cost which is the dominant factor, so beware of this bias.
The engineering costs can be further broken down into two subcategories: initial building costs and ongoing maintenance costs. This maintenance cost is the hardest to quantify as the questions to consider for it aren’t certain costs but rather they’re risks with probability and impact components which can be fuzzy to measure.
The following question prompts might help you to get a better handle on these. Run through this list for every option you’re considering, including your as-is implementation.
Initial building costs (doesn’t apply to the current implementation, only the proposed new option(s)):
- What code needs to be written?
- What tests need to be written?
- Are other separately deployed components in your architecture (e.g. the front-end web app) required to change or is the change isolated to a single component?
- Are any one-time data migration scripts needed?
- What will the cutover look like in the production environment? Is there an interim period where two solutions might run side by side?
- Can the new service be provisioned with IaC?
- What documentation needs to be updated?
Future risks & ongoing maintenance costs.
- Will the latency or billing cost of the current solution degrade as traffic and/or data volume grows? Can you estimate maximum expected throughput/data volumes and run back of the napkin calculations to gauge this?
- Are there particular categories of bugs which this solution would make more likely? For example, if duplicating data across multiple stores (like the DynamoDB to Algolia example), there’s a higher chance of concurrency, consistency and caching related issues.
- Could we be impacted by a privacy/security breach if introducing a new third party service provider?
- Could using a new third-party service provider introduce other operational concerns that we haven’t considered? (see Integrating a third party service into your AWS application for a list)
- Does this solution change our automated deployment pipelines or slow us down in any way? For example, can each developer still continue to provision a fully isolated cloud development environment without sharing resources with other team members? Is it still cost-effective for us to do so?
- Any third-party code libraries required will need to be kept up to date. Are these trustworthy?
And one final cost to consider is the opportunity cost. What other features or enhancements could you and your team have been building in the time it takes you to build and maintain an alternative implementation of this use case?
Indie Cloud Consultant helping small teams learn and build with serverless.
Learn more how I can help you here.
Join daily email list
I publish short emails like this on building software with serverless on a daily-ish basis. They’re casual, easy to digest, and sometimes thought-provoking. If daily is too much, you can also join my less frequent newsletter to get updates on new longer-form articles.