The DynamoDB Tools Gap
DynamoDB is big right now. Within the past six months its popularity seems to have exploded, at least in the online circles I frequent.
I have previously compared multi- and single- table DynamoDB data modelling approaches where I was somewhat skeptical about the widespread adoption of the single-table approach especially for apps in the early stages of development. Since writing that article, the landscape has changed and some of my concerns around the single table approach have lessened. A proper community is forming and I can see standard practices starting to emerge.
Alex DeBrie’s recently launched DynamoDB Book plugs a huge education gap for developers trying to understand the best way to use DynamoDB as the main datastore for a serverless application, especially for those coming from an RDBMS background. Jeremy Daly’s DynamoDB Toolbox promises to take away many of the pains of dealing with a single-table model inside your application code. And the new DynamoDB Place, also created by Alex, provides a forum for detailed discussion.
But there’s still a long way to go until the DynamoDB developer and operational standard practices approach the ubiquitousness of those in the RDBMS world.
Today I want to cover a few problem scenarios I still encounter when working on DynamoDB projects with clients that cause precious development time to be burned. All of these could be attributed to a lack of or immaturity of available tooling.
Direct data manipulation by non-backend engineers
Frontend developers who are consuming APIs backed by DynamoDB and QA engineers who are performing manual tests should have the ability to directly change data for items in the database. Of course this should never happen in a production environment, but during the development process it’s an unacceptable bottleneck to require folks to go via backend developers who understand the single-table model design, especially in distributed teams working across different timezones.
DynamoDB GUI clients such as Dynobase are a great improvement over the AWS Console and allow you to quickly access multiple DynamoDB databases across different accounts and edit individual items.
However, the problem still remains of finding the DynamoDB item that they need to edit. They either need to:
- have a good understanding of the composite generically named fields used to index the entity they’re interested in; or:
- use a tool to perform the edits which can abstract this indexing detail away and allow them to deal with the entity concept they’re familiar with
Option 1 is hard and probably an unrealistic expectation, and the tool for Option 2 does not exist yet AFAIK. I would love to see a DynamoDB admin GUI tool that combines the concept of Postman’s collections of requests (shareable between all team members) and NoSQL Workbench’s “Facet” concept.
Schema migrations
This was one of my primary concerns about choosing a single-table DynamoDB design when I first started getting into it. When my application access patterns changed (which they inevitably would), how would I update my indexes?
I’m not as worried about it now and have found that the need to perform a migration happens much less frequently than I had anticapted.
I think this problem is understated in the RDBMS world — a SQL schema needs to change much more frequently and this definitely slows down development velocity. This was one of the main reasons why I choose to move away from SQL towards MongoDB many years ago.
However, tools such as Rails Migrations have eased this pain for many SQL-backed applications. While the high-level strategies and low-level primitives for performing schema migrations in DynamoDB are becoming clearer (check out Chapter 15 of the DynamoDB book for a detailed list of migration strategies), these haven’t yet been solidified into tools that a team can quickly pick up.
Here are a few things I’d like to see in such a tool:
- A generic scanning engine that can iterate over an entire DynamoDB table or subset of it and execute a user-supplied function that maps the original item to a new item
- This engine should be runnable on the cloud (e.g. from a Lambda function), to make IAM permissions simpler and allow for robustness (dealing with partial failures, etc).
- In order to provide idempotency, migrations should be self-aware and know if they’ve already been applied or partially applied (e.g. by storing metadata items within the DynamoDB table they’re updating)
- Nice-to-have: Infrastructure-as-Code or CLI support so that a migration can be easily deployed and executed across multiple environments as part of a CI/CD pipeline.
Adhoc bulk deletion of data
I write a lot of integration tests when building serverless applications. Most of these tests write data to a real DynamoDB table. The tests get run as “acceptance” tests as a post-deploy step in my CI/CD pipelines and run against both pre-production and production environments (so blowing the entire table away isn’t an option). There are a few strategies I use to ensure that the tests clean up the any data they create in teardown steps, but in asynchronous systems this can be hard to control and sometimes test data gets left behind.
In addition to this, sometimes bugs creep in whereby duplicate data gets created and I need to bulk delete the duplicates.
Both these scenarios effectively come down to the same problem as the schema migration — identifying a set of items from the table and performing an operation (in this case a delete) on them using a single bulk command.
The scanning engine solution I proposed for migrations would probably solve this issue too, but given that this is more an adhoc activity, a simpler solution that’s executed within a GUI would suffice for my needs.
When building databases with MongoDB, I used a GUI client called Robomongo (now Robo 3T) that provided a REPL where I could quickly write and run JavaScript-based scripts that composed multiple MongoDB commands into a single operation against my database. I would love something like this for DynamoDB.
Who’s going to build this?
Maybe I’m being overly optimistic, but I would love to see DynamoDB single-table design become the de-facto approach for building databases for serverless apps on AWS. Its scalability story is already tied up, its education story is much improved but its developer productivity and tooling still have some way to go to win over SQL diehards.
Are you or do you know anyone currently working on a tool that could help in any of these areas? If so, I’d love to hear about it.
Other articles you might enjoy:
Free Email Course
How to transition your team to a serverless-first mindset
In this 5-day email course, you’ll learn:
- Lesson 1: Why serverless is inevitable
- Lesson 2: How to identify a candidate project for your first serverless application
- Lesson 3: How to compose the building blocks that AWS provides
- Lesson 4: Common mistakes to avoid when building your first serverless application
- Lesson 5: How to break ground on your first serverless project