Strategies for syncing denormalized data in DynamoDB

AWSDynamoDB

When working with DynamoDB (and indeed with most NoSQL databases), you may find that you need to copy certain data fields into different locations within the same table, or even into a different table. This duplication is often referred to as denormalization.

The reason this is needed is because, unlike RDBMS databases, DynamoDB doesn’t support performing joins in a single query. Therefore, in order to get all the data you require in an efficient read operation, you need to store it in the same place as the other data being fetched with it.

There are a few different ways you can implement this copying of data which we’ll cover below.

Example use case

To illustrate each strategy we’ll use the example of an application which uses a Single-table design data model and has User and Organization entities, each stored in separate partitions.

The “master copy” of the User data is stored under a USER-{userId} partition. But we also need to store the User details of the Organization owner under a ORGANIZATION-{orgId} partition.

So whenever the user updates their displayName (say through an AppSync API mutation or API Gateway request), there are 2 separate DynamoDB items we need to update.

Implementation strategies

Strategy 1: In a single atomic transaction

This involves using the TransactWriteItems operation to perform all the necessary changes (Put, Update, Delete) within a single atomic action.

Pros:

  • Copies of data are always consistent
  • No need for separate Lambda function
  • Easy to reason about in codebase as all updates are kept together in single function

Cons:

  • Increased user latency. Writing transactions will be marginally slower than a standard put/update but an added latency will occur if you need to perform a Get/Query before doing the transactWrite in order to fetch all the item primary keys to be updated
  • Can’t be (easily) performed in “constrained code” environments such as AppSync VTL resolvers and StepFunctions tasks, so you’ll probably need to do it in a Lambda function
  • If a particular data field can be updated from multiple sources (say different API endpoints), then this transactional logic will need to be carried out in each handler. This can be mitigated by keeping all the transactional updates in a shared module, but developers still need to know to use this.
  • The TransactWriteItems API operation has a limit of 25 items that can be written in single transaction. So if you have more copies than this (e.g. when copying parent root data into several child entities), you’ll lose the consistency benefit and have to code the API requests into batches.

Strategy 2: Asynchronously in a DynamoDB streams handler

This involves the API handler code simply updating the master copy of the User item in the DynamoDB table and a separate Lambda function being used to trigger off a DynamoDB stream and perform the required “copies”.

Pros:

  • Low-latency for user
  • Guaranteed to run irrespective of what source triggers the update of the master copy item

Cons:

  • Slight delay in updates to master and duplicate copies
  • DynamoDB Streams are noisy and don’t allow filtering (see Pros and cons of DynamoDB streams). This can result in complex handler logic in the same function if you have several different denormalised data items. This is particularly an issue for single-table design data models.
  • Risk of infinite recursion bug with stream handlers if you accidentally update the master copy again
  • Harder to reason about in codebase as the master copy changes are separate from the duplicates

Strategy 3: Asynchronously via an EventBridge handler

This involves the API handler code updating the master copy of the item in DynamoDB and then publishing a USER_UPDATED event to EventBridge. A separate Lambda handler would subscribe to this event and perform the required “copies”.

Pros:

  • Doesn’t require DynamoDB reads before performing the write
  • Can maintain single-purpose Lambda functions

Cons:

  • Slightly slower user-facing latency given extra network call to EventBridge
  • Slight delay in updates to master and duplicate copies
  • Can’t be (easily) performed in “constrained code” environments such as AppSync VTL resolvers and StepFunctions tasks, so you’ll probably need to do it in a Lambda function
  • If a particular data field can be updated from multiple sources (say different API endpoints), then this EventBridge publishing logic will need to be carried out in each handler. This can be mitigated by keeping all the denormalized updates in a shared module, but developers still need to know to use this.
  • Rollback code may be required — Since this is effectively performing a distributed transaction (a write to DynamoDB and EventBridge) within a single Lambda function. We would need to implement a try-catch block when writing to EventBridge and (in the situation that a transient error occurs in EV), add code to rollback the DynamoDB update and then return error to user. It’s highly unlikely for it to fail but if it does and there is no rollback code, then the data in DynamoDB will be inconsistent
  • Harder to reason about in codebase as the master copy changes are separate from the duplicates

Deciding between these strategies

The pros and cons of each strategy will have different weights depending on your use case.

My default approach would be strategy 1 as it has the least moving parts, and when all other factors are (almost) equal, I like to optimise for greater code maintainability. But if your context requires a very fast user response and you need to perform several reads to gather the data items to be updated, you may opt for 2 or 3.

Further reading

Originally published .

Other articles you might enjoy:

Free Email Course

How to transition your team to a serverless-first mindset

In this 5-day email course, you’ll learn:

  • Lesson 1: Why serverless is inevitable
  • Lesson 2: How to identify a candidate project for your first serverless application
  • Lesson 3: How to compose the building blocks that AWS provides
  • Lesson 4: Common mistakes to avoid when building your first serverless application
  • Lesson 5: How to break ground on your first serverless project

    ☎️ Serverless AMA

    Need quick guidance on a specific issue on your AWS serverless project? Or just wondering where to start with serverless?

    Book a call and ask me anything.

    Learn more >>

    🛫 Serverless Launchpad

    Ready to start building your new AWS serverless project but need help with getting everything setup?

    The Serverless Launchpad is a done-for-you DevOps service installed in under a week. You get a leading-practice multi-account AWS environment, a scaffolded codebase and architecture including the common AWS serverless services, isolated cloud environments for individual developers, automated delivery pipelines right through to production and much more. Everything is IaC, extensively documented and handed over to your developers.

    Learn more >>