Strategies for syncing denormalized data in DynamoDB

Paul Swail・5 min read

AWSDynamoDB

When working with DynamoDB (and indeed with most NoSQL databases), you may find that you need to copy certain data fields into different locations within the same table, or even into a different table. This duplication is often referred to as denormalization.

The reason this is needed is because, unlike RDBMS databases, DynamoDB doesn’t support performing joins in a single query. Therefore, in order to get all the data you require in an efficient read operation, you need to store it in the same place as the other data being fetched with it.

There are a few different ways you can implement this copying of data which we’ll cover below.

Example use case

To illustrate each strategy we’ll use the example of an application which uses a Single-table design data model and has User and Organization entities, each stored in separate partitions.

The “master copy” of the User data is stored under a USER-{userId} partition. But we also need to store the User details of the Organization owner under a ORGANIZATION-{orgId} partition.

So whenever the user updates their displayName (say through an AppSync API mutation or API Gateway request), there are 2 separate DynamoDB items we need to update.

Implementation strategies

Strategy 1: In a single atomic transaction

This involves using the TransactWriteItems operation to perform all the necessary changes (Put, Update, Delete) within a single atomic action.

Pros:

Copies of data are always consistent
No need for separate Lambda function
Easy to reason about in codebase as all updates are kept together in single function

Cons:

Increased user latency. Writing transactions will be marginally slower than a standard put/update but an added latency will occur if you need to perform a Get/Query before doing the transactWrite in order to fetch all the item primary keys to be updated
Can’t be (easily) performed in “constrained code” environments such as AppSync VTL resolvers and StepFunctions tasks, so you’ll probably need to do it in a Lambda function
If a particular data field can be updated from multiple sources (say different API endpoints), then this transactional logic will need to be carried out in each handler. This can be mitigated by keeping all the transactional updates in a shared module, but developers still need to know to use this.
The TransactWriteItems API operation has a limit of 25 items that can be written in single transaction. So if you have more copies than this (e.g. when copying parent root data into several child entities), you’ll lose the consistency benefit and have to code the API requests into batches.

Strategy 2: Asynchronously in a DynamoDB streams handler

This involves the API handler code simply updating the master copy of the User item in the DynamoDB table and a separate Lambda function being used to trigger off a DynamoDB stream and perform the required “copies”.

Pros:

Low-latency for user
Guaranteed to run irrespective of what source triggers the update of the master copy item

Cons:

Slight delay in updates to master and duplicate copies
DynamoDB Streams are noisy and don’t allow filtering (see Pros and cons of DynamoDB streams). This can result in complex handler logic in the same function if you have several different denormalised data items. This is particularly an issue for single-table design data models.
Risk of infinite recursion bug with stream handlers if you accidentally update the master copy again
Harder to reason about in codebase as the master copy changes are separate from the duplicates

Strategy 3: Asynchronously via an EventBridge handler

This involves the API handler code updating the master copy of the item in DynamoDB and then publishing a USER_UPDATED event to EventBridge. A separate Lambda handler would subscribe to this event and perform the required “copies”.

Pros:

Doesn’t require DynamoDB reads before performing the write
Can maintain single-purpose Lambda functions

Cons:

Slightly slower user-facing latency given extra network call to EventBridge
Slight delay in updates to master and duplicate copies
Can’t be (easily) performed in “constrained code” environments such as AppSync VTL resolvers and StepFunctions tasks, so you’ll probably need to do it in a Lambda function
If a particular data field can be updated from multiple sources (say different API endpoints), then this EventBridge publishing logic will need to be carried out in each handler. This can be mitigated by keeping all the denormalized updates in a shared module, but developers still need to know to use this.
Rollback code may be required — Since this is effectively performing a distributed transaction (a write to DynamoDB and EventBridge) within a single Lambda function. We would need to implement a try-catch block when writing to EventBridge and (in the situation that a transient error occurs in EV), add code to rollback the DynamoDB update and then return error to user. It’s highly unlikely for it to fail but if it does and there is no rollback code, then the data in DynamoDB will be inconsistent
Harder to reason about in codebase as the master copy changes are separate from the duplicates

Deciding between these strategies

The pros and cons of each strategy will have different weights depending on your use case.

My default approach would be strategy 1 as it has the least moving parts, and when all other factors are (almost) equal, I like to optimise for greater code maintainability. But if your context requires a very fast user response and you need to perform several reads to gather the data items to be updated, you may opt for 2 or 3.

How to transition your team to a serverless-first mindset

In this 5-day email course, you’ll learn:

Lesson 1: Why serverless is inevitable
Lesson 2: How to identify a candidate project for your first serverless application
Lesson 3: How to compose the building blocks that AWS provides
Lesson 4: Common mistakes to avoid when building your first serverless application
Lesson 5: How to break ground on your first serverless project

Strategies for syncing denormalized data in DynamoDB

Example use case

Implementation strategies

Strategy 1: In a single atomic transaction

Strategy 2: Asynchronously in a DynamoDB streams handler

Strategy 3: Asynchronously via an EventBridge handler

Deciding between these strategies

Further reading

Other articles you might enjoy:

Free Email Course

How to transition your team to a serverless-first mindset

🩺
Architecture & Process Review

🪲 Testing Audit

Example use case

Implementation strategies

Strategy 1: In a single atomic transaction

Strategy 2: Asynchronously in a DynamoDB streams handler

Strategy 3: Asynchronously via an EventBridge handler

Deciding between these strategies

Further reading

Other articles you might enjoy:

Free Email Course

How to transition your team to a serverless-first mindset

🩺 Architecture & Process Review

🪲 Testing Audit

🩺
Architecture & Process Review