Inter-service communication channels for serverless microservices in AWS
When building a serverless application, a time will come when you need to decide how to split it up into multiple services.
Maybe you started with a single-service monolithic design but your application has grown in size and complexity and you now want to split things apart. Or maybe you’re just starting a project and already have an idea of distinct services with clear responsibilities.
Either way, a major part of your service design process will involve deciding how each service will communicate with each other.
I recently talked more generally about the different methods of performing async communication in AWS. This article is related, but here I’m going to focus on the inter-service aspect, and specifically on the different channels you can use to have services talk to each other, and the pros and cons of each.
Why bother splitting a solution into (micro)services?
Before we dig in further, it’s worth taking a second to go back to the fundamental goals of microservice design:
- Small components of the system can be deployed independently, allowing for faster and safer releases.
- Runtime attributes such as availability, scalability and latency are decoupled across services.
- Different teams can work relatively independently on each service with just a shared contract between them.
- Cross-cutting functionality can be centralised and reused.
With these benefits in mind, let’s look at the different channels that your custom services can use to communicate with each other within the AWS serverless ecosystem.
API Gateway (sync)
Use Case:
Service A calls Service B synchronously over HTTP in order to fetch data that it requires to process a request. This is facilitated by an API Gateway endpoint exposed by Service B.
Pros:
- Service A doesn’t need to maintain its own copy of Service B’s data.
- Standard HTTP interface makes it easier to integrate with for any non-AWS hosted microservices in your system.
Cons:
- Service A’s latency and availability is now dependent on those of Service B. If Service A is user-facing, then Service B’s response time is now on the critical path.
- Changes to Service B have potential to introduce errors in Service A (and to the user).
- A separate API Gateway endpoint along with its associated auth needs to be configured.
Lambda Function (sync)
Use Case:
Service A calls Service B synchronously over HTTP in order to fetch data that it requires to service a request. This is facilitated by a Lambda function exposed by Service B.
Pros:
- Service A doesn’t need to maintain its own copy of Service B’s data.
- No need for separate API Gateway as AWS Lambda API is already invokable over HTTP
Cons:
- Service A’s latency is now dependent on Service B’s. If A is user-facing, then Service B’s response time is now on the critical path.
- Changes to Service B have potential to introduce errors in Service A (and to the user).
- At deploy-time, Service A needs to know the function ARN of Service B’s Lambda function, both in order to communicate with it and to configure IAM permissions.
DynamoDB Streams (async)
Use Case:
Service A contains a DynamoDB table. Changes to this table are fed to a DynamoDB stream which Service B consumes.
Pros:
- Availability and latency of Service A is independent from those of the resources (Lambda functions and downstream services) inside Service B.
- Service A is already writing to a DynamoDB table, so no extra work is required to expose these messages to other services.
- Multiple downstream services can subscribe to the stream.
Cons:
- There is currently no way for Service B to ignore changes it’s not interested in, as DynamoDB Streams have no filtering mechanism. For large single-table designs, this could mean a lot of redundant Lambda function invocations and potential leaking of data that Service B shouldn’t be allowed access to.
- Messages in the stream are in a low-level database item schema format. By using this message schema as an interface contract, it means that any changes to Service A’s data access logic (e.g. by creating a new composite index field) need to take care they don’t break downstream services consuming this message.
- At deploy-time, Service B needs to know the ARN of Service A’s DynamoDB stream.
SQS Queue (async)
Use Case:
Service A needs to perform a common task that multiple other services need to perform (e.g. sending an email using a consistent template format). Service B implements this task.
Pros:
- Availability and latency of Service A is independent from those of the resources (Lambda functions and downstream services) inside Service B (with the exception of the queue itself).
- Queues are a natural fit for task-based messages rather than domain-event based messages.
- Allows for a strict processing order if required (using FIFO queues)
Cons:
- Messages can only have one downstream subscriber/processor.
PubSub — SNS or EventBridge (async)
Use Case:
An event occurs in Service A and it wants to let others know about it. It doesn’t require any data or need any knowledge of these other services. Service B is interested in this event and so subscribes. Service A publishes message to a topic. Service B subscribes and processes. This use case can be served by either an SNS topic or an EventBridge event bus.
Pros:
- Availability and latency of Service A is independent from those of the resources (Lambda functions and downstream services) inside Service B (with the exception of the queue itself).
- Multiple services can consume a single event message.
- Service B only needs knowledge of the domain event message schema in order to process the message.
Cons:
- Messages can only be consumed one at a time rather than in a batch.
- There’s no guarantee of message delivery order to downstream services.
PubSub resources as standalone microservices
The last 2 examples of using SNS and EventBridge require a topic or event bus to be set up for both the publishing and subscribing services to be able to functions. This begs the question, which service owns the channel (the topic or the event bus)? It often doesn’t make sense for either to own it, so in this case they could be deployed as a standalone service in their own right or as part of a core-infra
stack that is deployed before all other services.
Summary Recommendations
To finish off, here’s a list of guidelines to help you design your inter-service communication channels:
- Avoid synchronous inter-service communications as far as possible.
- Prefer PubSub as your main asynchronous communication method for domain events.
- Use SQS for small task-based services focused around shared functionality.
- If you need to react to changes to a DynamoDB table, consider keeping the DynamoDB stream inside the service boundary and instead have an internal Lambda function subscribe to it, filter out irrelevant messages, transform the message into a domain event schema and then publish it to SNS or EventBridge.
Other articles you might enjoy:
Free Email Course
How to transition your team to a serverless-first mindset
In this 5-day email course, you’ll learn:
- Lesson 1: Why serverless is inevitable
- Lesson 2: How to identify a candidate project for your first serverless application
- Lesson 3: How to compose the building blocks that AWS provides
- Lesson 4: Common mistakes to avoid when building your first serverless application
- Lesson 5: How to break ground on your first serverless project