Python Extension Modules in AWS Lambda

In 2015, I wrote a very short post on how to build Python modules so that they would work in AWS Lambda functions. Some things have changed since then, others have not.

Generally, AWS Lambda is more capable, supports more languages, and is quickly becoming the go-to service for compute within the AWS Cloud.

Specific to Python, AWS Lambda currently supports Python 2.7 and 3.6. Python 2.7 is current going to sunset in 2020, so it’s a great time to start—I mean finish—migrating project to 3.x. The good news is that very little changes between 2 and 3 when it comes to module deployment.

Modules

One of the questions that immediately bubbled up in the community was around 3rd party modules. What’s the best way to deploy & manage them?

The documentation has the basics down for most modules;

  1. Create a virtualenv locally
  2. Install the module inside the virutalenv
  3. Copy the module over to your project
  4. Import the module locally
  5. Bundle the local module with your Python code

Remember that your local file structure should look like this;

# local file structure
my_project
   my_project.py
   3rd_party_module_A
      3rd_party_module_A.py
   3rd_party_module_B
      3rd_party_module_B.py
   ...

But what about extension modules that need to be compiled for the local system?

Extension Modules

As a quick reminder, extension modules are a module written in C or C++ that can either extend Python or call C or C++ libraries.

Extensions modules are very handy where there are very specific high performance requirements or when you want to leverage an existing C/C++ library. There are a ton of high quality, high performance libraries already out there, why not take advantage of them in Python?

Building Extension Modules

The challenge for extension modules is that they are compiled specifically on the platform. You need a build chain available in addition to pip/Setuptools to get them configured in your Python environment.

The AWS Lambda team has published the execution environment details in the documentation. So we know that our functions are executed on an Amazon Linux instance with Boto 3 available in addition to the standard library for Python 2.7 or 3.6.

In order to work in AWS Lambda, extension modules need to be compiled for Amazon Linux x86_64.

If you’re building your project directly in the AWS Cloud, you’re /probably/ all set (but we’ll still cover that later on).Your code will work on AWS Lambda (probably ;-) ). But if you’re developing locally on your Macbook or Windows that’s another story (and extension modules on Windows truly are another story all together).

While you can develop locally with confidence that you’re Python code is going to continue to work, when it comes time to deploy it to AWS Lambda, we’re going to have to do some additional work.

The three simplest ways to build extension modules to AWS Lambda are;

  1. AWS Cloud9
  2. Docker
  3. EC2 Build Instance

AWS Cloud9

Yes, my top suggestion is to avoid the issue of building extension modules locally all together. The AWS Cloud9 IDE is slick, real slick. The IDE runes on it’s own EC2 instance so under the hood, anything built via the IDE will work in AWS Lambda.

As an added bonus, the workflow for developing in AWS Lambda is smooth. Check it out.

Docker

AWS has published the Amazon Linux Container Image which runs locally. This container is built from the same components as Amazon Linux and providers the perfect base for a build chain to create Python extension modules.

In fact, this is what AWS Cloud9 uses in the background for “local” development.

The workflow is simple:

  1. Write your Python code
  2. Deploy to an instance of the Amazon Linux Container Image
  3. Build any required extension modules
  4. Test your code
  5. Deploy to AWS Lambda (assuming things worked out!)

EC2 Build Instance

I like to think of AWS Lambda as a build target. I develop locally and then when it comes time to deploy, I need to ensure that I’ve built my project for the target environment.

When your project contains an extension module, you’re going to have to ensure that you have a version compiled for Amazon Linux x86_64 before you deploy it to the service.

Typically this is going to involve;

  1. Spin up a new Amazon Linux EC2 instance
  2. Run the following setup script with admin privileges (this works nicely as a user-data script);
  3. Install any modules we need using the syntax; pip install ______ -t ~/______
  4. Export the modules (from the folder specified after ‘-t’) so you can bundle them with your code
  5. Shutdown the instance

This process works for the majority of extension modules where any required libraries are installed on Amazon Linux by default. If the libraries required for the extension module aren’t available by default, you’re going to have to package them as well.

Once you copy the exported module and any supporting libraries into your code bundle, you can zip it, and use it in AWS Lambda.

AWS Lambda Limits

This is a great time to bring up AWS Lambda limits. The service has limits on memory allocation, execution time, and—most relevant here—limits on the size of your function code.

The current (24-Feb-2018) limit is 50MB for your code zipped/in a jar.

These limits change all of the time, so make sure to check the official docs. The good news is that you also access to another 512 MB of disk space in /tmp if you need.

Specific to Python, it’s easy to run into some issues. For example, when I wrote the original post in 2015, SciPy weighed in at a whopping 221MB! That’s a problem. Fortunately, SciPy is very popular and has had some work down to reduce the build and make it more portable. Ryan Scott Brown writing over at Serverless Code has a great post on how to build the module in it’s current design.

It’s now 40MB. That’s close enough to keep an eye on but still workable.

Designing For AWS Lambda

While AWS Lambda is capable of hosting an entire process or workflow in a single function, that’s not a good design choice.

Circling back to the SciPy stack, odds are we’ll want our functions leveraging pandas (data structuring & analysis) separate from those functions leveraging matplotlib (visualization). That’ll help avoid the size limits as well.

The key to success in Lambda is ensuring that your application is broken down into small, efficient functions. Remember you can always tie them together using AWS Step Functions or the [Amazon API Gateway](https://aws.amazon.com/api-gateway/.

In the meantime, if you have any questions or comments, I’m @marknca on Twitter and GitHub :-)