MENU

Deploying Lambda Functions with Binaries: LaTeX Compilation Implementation

18 November 2024


Imagine needing to generate professional-grade PDFs, process files, or run complex scripts in a serverless environment. AWS Lambda’s serverless model eliminates the hassle of managing infrastructure, but it comes with its own set of limitations—particularly when your application requires access to advanced software like LaTeX. The default Lambda runtime is lightweight and lacks the tools and binaries needed for such operations, which makes deploying these applications a significant challenge.

For applications requiring binaries or software exceeding the 50 MB deployment package size limit, a different approach is needed. In this guide, we'll tackle this problem by exploring how to deploy a LaTeX-powered Lambda function using a containerized custom runtime. By packaging all necessary dependencies into a Docker image, we can overcome Lambda's runtime restrictions while ensuring our function remains scalable and easy to deploy.

I'll walk you through the step-by-step process of creating a Docker-based deployment pipeline, enabling your Lambda function to seamlessly generate PDF documents using LaTeX. Whether you're building a document generation service, report creation system, or any other application requiring binary dependencies, this guide will help you navigate the complexities of deploying Lambda functions with custom runtime requirements.

Note: Writing Python scripts to compile LaTeX files is beyond the scope of this discussion. For simplicity, this guide assumes you have a Python script that processes LaTeX documents and generates PDFs. However, If you're just starting, you can use Python's built-in subprocess module to invoke a TeX compiler for generating PDFs from LaTeX sources.

The Challenge of Deploying Binaries to Lambda


AWS Lambda functions operate within a carefully controlled environment designed for simplicity and security. While this design philosophy works well for many applications, it introduces several significant limitations:

  1. Deployment Package Size:
    • ZIP deployment packages are limited to 50 MB compressed
    • Layer size restrictions of 250 MB uncompressed
    • Combined package and layer limit of 250 MB
  2. Package Management:
    • Cannot use standard package managers (apt, yum, etc.) like we can on an ec2 instance or your local machine
    • Binary dependencies must be pre-compiled for Amazon Linux, which I learned the hard way when I first tried to set this up. I remember trying to use Amazon Linux (since it's the default Lambda environment), but ran into a bunch of headaches because it uses yum as its package manager. The biggest pain point? The default Amazon Linux repositories have very limited TeX/LaTeX packages compared to what you get with Ubuntu/Debian repositories. 
    • Some packages you'd expect to find just aren't there at all, and even when you do find them, they often have different names than what you're used to. For example, what's `texlive-latex-extra` in Ubuntu becomes `texlive-latex` in Amazon Linux - and trust me, that can lead to some confusing moments when you're following tutorials!

      Later in this guide, I'll show you how we can make our lives much easier by using Ubuntu as our base image instead. Sometimes the path of least resistance is the best path forward!

We'll build a Docker image that bundles all the necessary LaTeX dependencies and tooling into a custom runtime environment for our Lambda function.

Creating the Docker Image


The key to successfully deploying a Lambda function with binary dependencies is to create a self-contained, reproducible runtime environment (A docker image). Docker is an excellent tool for this, as it allows us to package our function code, libraries, and system dependencies into a single, portable image.

Here's a high-level overview of the steps we'll take to create the Docker-based deployment:

  • Base Image Selection: We'll start with an Ubuntu-based image that includes the necessary system packages and tools, such as wget, tar, fontconfig, and the core LaTeX distribution. And we will also define our image in 3 Stages for our convenience. 
    Set the DEBIAN_FRONTEND variable to noninteractive to bypass prompts that normally require user input during software installation—this is crucial because Docker builds are non-interactive.
  • Installing necessary Stuff: After selecting Ubuntu as our base image we will install necessary things that we need our lambda function to have, in this case latex along with python and other dependencies necessary for installing them. (you can install the packages and libraries according to your needs.)
  • Font Installation: In this specific image i am also installing fonts for latex (Times new roman) you can also install specific fonts for latex if they don't come by default.
  • Setting the working directory: it is important to explicitly set the working directory where the function's code and dependencies reside. This is done using the WORKDIR instruction in the Dockerfile.
  • Copying your code files:  After specifying the working directory you will copy your python script in it. (In this case this script will compile latex)
  • Install Pip & awslambdaric:  This step is crucial regardless of the specific use case—whether you're compiling LaTeX, using custom binaries, or building a Lambda function that requires other non-standard system dependencies. When deploying a Docker-based Lambda function, awslambdaric is necessary because the function uses a custom runtime through a container image.
    Unlike the standard AWS Lambda runtime (used when deploying via a ZIP file or AWS SAM template), a containerized Lambda does not inherently manage the communication with the AWS Lambda Runtime API.
    In this setup, I am installing the Python version of awslambdaric, but it is worth noting that AWS provides similar libraries for other languages, such as JavaScript, to support custom runtimes for a variety of programming environments.
  • Setting the Entry point and executing lambda: The ENTRYPOINT specifies the command that should run as the primary process when the container starts. In this case, it runs the Python interpreter located in the virtual environment (/venv/bin/python3.11) with the module awslambdaric. awslambdaric acts as the interface between the AWS Lambda service and your function's handler, enabling the container to process Lambda events and return responses.

By following this approach, we can ensure that our Lambda function has access to all the necessary components to compile LaTeX sources and generate PDFz, without relying on the default Lambda runtime environment.

Building the image and deploying the Lambda Function

After writing the image code we have to build it and push the code to aws ECR. After that, we’ll create a Lambda function and set it up to use the ECR repo as the source for the code. Here’s a detailed step-by-step guide with commands.

  • Build and Push the Docker Image: We'll use Docker to build the Docker image and push it to aws container registry (Amazon Elastic Container Registry (ECR)). Here i am using 'my-latex-lambda' as the image name, you can change it if you want.

    $ docker build -t my-latex-lambda .

  • Create ECR Repo in AWS: The command bellow will output a json after creating the repo, you will need to copy the 'repositoryUri'.

    $ aws ecr create-repository --repository-name my-repo

  • Authenticating docker into an ECR Repe: Now you have to replace the '<uri>' in the command bellow with 'repositoryUri'

    $ aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 

  • Tagging the image and pushing to the repo: Again  you have to replace the '<uri>' in the commands bellow with 'repositoryUri'. After that, run each command in sequence and it will push the image to the reop.

    $ docker tag my-latex-lambda:latest <uri>:latest

    $ docker push <uri>:latest

  • Create the Lambda Function: Go to the AWS Lambda console, create a new function, and set it up to use the Docker image we just pushed to ECR. For this example, we’re using Python, but you can pick the runtime that works for your setup.

By following this deployment process, we can ensure that our LaTeX-powered Lambda function is packaged with all the necessary dependencies, making it easy to deploy and scale as needed.

Here's a reference implementation of the Docker image we discussed. it provides a solid foundation that you can adapt to your specific needs:


Conclusion

Deploying Lambda functions with binary dependencies presents unique challenges, but Docker-based containerization offers an elegant solution. This approach provides several key benefits

Expanded Capacity: You can now package up to ~10GB of code and content in your function, eliminating the need to juggle multiple Lambda layers.
Simplified Dependencies: The containerized approach creates a self-contained, reproducible runtime environment that's easier to manage and deploy.
Flexibility: While we've focused on LaTeX in this example, this strategy works equally well for other use cases requiring custom binaries, such as:

  • Image processing tools
  • Video transcoding utilities
  • File conversion software
  • Custom compilation tools

However, it's important to note that this approach does come with some trade-offs. Larger container images can increase cold start times and impact function performance. As with any architectural decision, you'll need to balance these factors against your specific requirements.
The power of this approach lies in its versatility - once you understand how to containerize one type of binary dependency for Lambda, you can apply the same principles to virtually any other complex runtime requirement. Whether you're generating PDFs with LaTeX, processing images, or transcoding videos, the containerized Lambda pattern provides a robust foundation for your serverless applications.

Cookies

This website uses cookies. Cookies are small text files stored on your device that allow us to analyze site traffic and in some cases-monitor user activity. Without cookies, certain features of the website may not function correctly.