Use the Errors section of the docs to make your AWS integrations more robust

If you’re implementing a cloud integration in AWS, you likely want to make it robust so that it can gracefully handle errors and recover from failures. To do this, you first need to understand the different ways in which your integration might fail.

My starting point for this is usually the AWS API docs for the service I’m integrating with. Almost every API endpoint of every AWS service defines an “Errors” section in its docs that lists out the different types of errors that can be returned.

Using this as the starting point, I can then:

  • Understand what validation or sanitisation I need to perform on any request parameters I provide
  • Decide if I need to add compensating logic to handle a particular error type, such as a catch block in a Lambda function or a fork in a StepFunctions state machine
  • Decide what edge-case automated tests I need to write in order to verify that any compensating logic I write is correct
  • Determine if there are any operational concerns such as throughput exceeded or throttling errors that I need to consider in my code that invokes the API, e.g. ensuring I don’t flood an API with a ton of parallel requests.
  • See if a particular error type is transient, and so worth retrying, or permanent. If it’s retryable, I can then determine if I can safely allow a particular error type to throw without handling it inside a Lambda function (triggering automatic retry, if invoked async) or if I instead need to manually handle retrying in my own code.
  • Know if this API call could produce a partial failure that I need to handle in my code (e.g. DynamoDB’s BatchWriteItem). Note: partial errors are typically provided in the API Response object and not as an error type.
  • Know what specific metrics I need to alert on in CloudWatch so I can monitor any operational issues

Trying to understand all the failure modes of a distributed cloud system can be overwhelming, but the simple step of RTFM, specifically the errors section of the API docs, is a great place to start.

Join daily email list

I publish short emails like this on building software with serverless on a daily-ish basis. They’re casual, easy to digest, and sometimes thought-provoking. If daily is too much, you can also join my less frequent newsletter to get updates on new longer-form articles.

    View Emails Archive

    🩺
    Architecture & Process Review

    Built a serverless app on AWS, but struggling with performance, maintainability, scalability or DevOps practices?

    I can help by reviewing your codebase, architecture and delivery processes to identify risk areas and their causes. I will then recommend solutions and help you with their implementation.

    Learn more >>

    🪲 Testing Audit

    Are bugs in production slowing you down and killing confidence in your product?

    Get a tailored plan of action for overhauling your AWS serverless app’s tests and empower your team to ship faster with confidence.

    Learn more >>