AWS Lambda is a serverless event-driven compute service, part of a category of cloud compute offerings sometimes called Function-as-a-service (FaaS). When we first released AWS Lambda, functions were limited to 250MB of code and dependencies, packaged as a simple compressed archive. In 2020, we released support for deploying container images as large as 10GiB as Lambda functions, allowing customers to bring much larger code bases and sets of dependencies to Lambda. Supporting larger packages, while still meeting Lambda's goals of rapid scale (adding up to 15,000 new containers per second for a single customer, and much more in aggregate), high request rate (millions of requests per second), high scale (millions of unique workloads), and low start-up times (as low as 50ms) presented a significant challenge. We describe the storage and caching system we built, optimized for delivering container images on-demand, and our experiences designing, building, and operating it at scale. We focus on challenges around security, efficiency, latency, and cost, and how we addressed these challenges in a system that combines caching, deduplication, convergent encryption, erasure coding, and block-level demand loading. Since building this system, it has reliably processed hundreds of trillions of Lambda invocations for over a million AWS customers, and has shown excellent resilience to load and infrastructure failures.
翻译:AWS Lambda是一种无服务器事件驱动计算服务,属于“函数即服务”(Function-as-a-Service, FaaS)类云计算产品。在最初发布AWS Lambda时,函数被限制在250MB的代码和依赖项内,并以简单的压缩存档形式打包。2020年,我们推出了对高达10GiB容器镜像作为Lambda函数部署的支持,使客户能够将更大的代码库和依赖集引入Lambda。在支持更大包的同时,仍需满足Lambda快速扩展(单一客户每秒最多新增15,000个容器,聚合规模更大)、高请求率(每秒数百万次请求)、高规模(数百万个独特工作负载)以及低启动时间(低至50毫秒)的目标,这构成了重大挑战。本文描述了为优化容器镜像按需交付而构建的存储与缓存系统,以及我们在设计、构建和规模化运行该系统过程中的经验。我们重点探讨了安全性、效率、延迟和成本方面的挑战,以及如何通过结合缓存、去重、收敛加密、纠删码和块级按需加载的系统来应对这些挑战。自构建该系统以来,它已可靠地为超过一百万个AWS客户处理了数百亿次Lambda调用,并在负载和基础设施故障方面表现出卓越的韧性。