Saturday, July 4, 2020

Packaging Java for AWS Lambda

When I want to package a Java application in a single JAR with all of its dependencies, I normally turn to Maven’s Shade plugin. This plugin builds an “UberJAR” by unpacking all of the files from the project's dependencies and repackaging them into a single JAR.

The Shade plugin is also used by the Lambda Developer's Guide. I was surprised, then to see this later in the same guide:

Reduce the time it takes Lambda to unpack deployment packages authored in Java by putting your dependency .jar files in a separate /lib directory. This is faster than putting all your function’s code in a single jar with a large number of .class files.

I was surprised because a single JAR containing class files should be the most efficient way to deploy a project. The JVM memory-maps the JAR files on its classpath, which means that it can access arbitrary parts of those files without an explicit call to the OS kernel. And UberJAR means that the JVM only has to examine a single directory structure.

After some investigation, I discovered the reason: Lamdba unpacks the contents of the deployment package into its filesystem. Which means that an UberJAR turns into a lot of little files, each of which must be read using several kernel calls.

OK, so if you’re a Maven user, how do you package your Lambdas in the prefered way? The answer is that instead of using the Shade plugin, you use the Assembly plugin. The stated goal of this plugin is to allow developers to package their project artifact with related information such as a website. But it's a very flexible plugin, using a “deployment descriptor” to specify which files should be included in the final artifact, and where they should go.

This is the descriptor that I used for my Java Lambda example. It's duplicated for each of the Lambdas, and is found at src/assembly/deployment.xml:

<assembly xmlns="http://maven.apache.org/ASSEMBLY/2.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/ASSEMBLY/2.0.0 http://maven.apache.org/xsd/assembly-2.0.0.xsd">
    <id>deployment</id>
    <includeBaseDirectory>false</includeBaseDirectory>
    <formats>
        <format>zip</format>
    </formats>
    <dependencySets>
        <dependencySet>
            <outputDirectory>/lib</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <unpack>false</unpack>
            <scope>runtime</scope>
        </dependencySet>
    </dependencySets>
</assembly>

The descriptor is simple: it says that every JAR used by the Lambda — including the one built by the project itself — should be stored in the /lib directory of the final artifact. To use it, I replace the Shade configuration in my POM with this (the plugin.assembly.version property defined as 3.2.0):

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-assembly-plugin</artifactId>
    <version>${plugin.assembly.version}</version>
    <configuration>
        <descriptors>
            <descriptor>src/assembly/deployment.xml</descriptor>
        </descriptors>
    </configuration>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>single</goal>
            </goals>
        </execution>
    </executions>
</plugin>

And that's it: when I run mvn package I get a file in my target directory with the suffix -deployment.zip.