Shared storage for lambda functions

2020-06-28

A recent announcement from AWS sparked some joy: You are now able to mount an actual file system on lambda functions. Lambdas always come with a very limited 512-MB file system attached, but starting now an Elastic File System (EFS) can be attached to a function. The cool thing with EFS is, that it can be shared. So multiple systems can access the same file system and you can even have an EC2 instance attach to it. That has the potential to be a real game changer for creating simple, approachable data systems and APIs.

The setup is straightforward and can be completed solely via the AWS console (GUI):

Create an EFS
Create a lambda function and connect the EFS via an EFS Access Point
Optionally spin up a t3.micro instance mounted via the same access point to administer the file system

EFS comes with reasonable pricing and properties like life cycle management, AWS Backup support and redundant durable storage across multiple availability zones. Of course it’s not as performant as an Elastic Block Store SSD directly attached to an EC2 instance, but with 35.000 IOPS in the default mode it should cover a large array of use cases.

Having shared disk space available in lambdas offers the ability to run embedded databases in your function. Think RocksDB, LevelDB or my personal favorite SQLite. I’ve tested the latter one as soon as I read the news; it’s something I always wanted: A no-frills shared database for lambdas. You always needed to provision a separated system if you wanted shared queryable storage: AWS RDS, Cloud Firestore or AWS DynamoDB. All come with restrictive limitations on IO and storage size. It’s also a big hinderance during development: Having a SQLite file on your machine beats trying to connect to a cloud database service, which always ends up in me figuring out the correct IAM permissions...

Throw an API Gateway into the mix and you’ve got yourself a cheap, scalable REST API. You won’t get sub-20ms query performance, but a warmed up lambda function does have pretty solid performance:


$ ab -n 100 https://***.execute-api.eu-west-1.amazonaws.com/default/session-api

Connection Times (ms)
min  mean[+/-sd] median   max
Connect:      104  138  66.2    114     397
Processing:   283  391 151.1    390    1806
Waiting:      283  391 151.1    390    1806
Total:        404  529 166.7    511    1976

Percentage of the requests served within a certain time (ms)
50%    511
66%    513
75%    515
80%    534
90%    617
95%    717
98%    850
99%   1976
100%   1976 (longest request)

This is for a somewhat simple query on a somewhat complex view (gist) executed on a Node.js 12 function with 1024-MB memory.

There is enough uses cases where latency isn’t an issue, for example a simple metadata store for service discovery or a highly-available configuration management service come to mind. Working with stateful streaming data services a lot, I could imagine having a queryable state store based on EFS & SQLite:

Consume data in a service running on top of EC2 with EFS mounted
Compute and persist the stream state to disk
Provide access to the state via a lambda backed API, e.g. for analytical queries for “live” user session

As you can guess, I’m pretty excited about this feature.

← Other posts