Shared storage for lambda functions
2020-06-28
A recent announcement from AWS sparked some joy: You are now able to mount an actual file system on lambda functions. Lambdas always come with a very limited 512-MB file system attached, but starting now an Elastic File System (EFS) can be attached to a function. The cool thing with EFS is, that it can be shared. So multiple systems can access the same file system and you can even have an EC2 instance attach to it. That has the potential to be a real game changer for creating simple, approachable data systems and APIs.
The setup is straightforward and can be completed solely via the AWS console (GUI):
- Create an EFS
- Create a lambda function and connect the EFS via an EFS Access Point
- Optionally spin up a t3.micro instance mounted via the same access point to administer the file system
EFS comes with reasonable pricing and properties like life cycle management, AWS Backup support and redundant durable storage across multiple availability zones. Of course it’s not as performant as an Elastic Block Store SSD directly attached to an EC2 instance, but with 35.000 IOPS in the default mode it should cover a large array of use cases.
Having shared disk space available in lambdas offers the ability to run embedded databases in your function. Think RocksDB, LevelDB or my personal favorite SQLite. I’ve tested the latter one as soon as I read the news; it’s something I always wanted: A no-frills shared database for lambdas. You always needed to provision a separated system if you wanted shared queryable storage: AWS RDS, Cloud Firestore or AWS DynamoDB. All come with restrictive limitations on IO and storage size. It’s also a big hinderance during development: Having a SQLite file on your machine beats trying to connect to a cloud database service, which always ends up in me figuring out the correct IAM permissions...
Throw an API Gateway into the mix and you’ve got yourself a cheap, scalable REST API. You won’t get sub-20ms query performance, but a warmed up lambda function does have pretty solid performance:
$ ab -n 100 https://***.execute-api.eu-west-1.amazonaws.com/default/session-api Connection Times (ms) min mean[+/-sd] median max Connect: 104 138 66.2 114 397 Processing: 283 391 151.1 390 1806 Waiting: 283 391 151.1 390 1806 Total: 404 529 166.7 511 1976 Percentage of the requests served within a certain time (ms) 50% 511 66% 513 75% 515 80% 534 90% 617 95% 717 98% 850 99% 1976 100% 1976 (longest request)This is for a somewhat simple query on a somewhat complex view (gist) executed on a Node.js 12 function with 1024-MB memory.
There is enough uses cases where latency isn’t an issue, for example a simple metadata store for service discovery or a highly-available configuration management service come to mind. Working with stateful streaming data services a lot, I could imagine having a queryable state store based on EFS & SQLite:
- Consume data in a service running on top of EC2 with EFS mounted
- Compute and persist the stream state to disk
- Provide access to the state via a lambda backed API, e.g. for analytical queries for “live” user session
As you can guess, I’m pretty excited about this feature.