Function-as-a-service (FaaS) is a popular serverless computing paradigm for developing event-driven functions that elastically scale on public clouds. FaaS workflows, such as AWS Step Functions and Azure Durable Functions, are composed from FaaS functions, like AWS Lambda and Azure Functions, to build practical applications. But, the complex interactions between functions in the workflow and the limited visibility into the internals of proprietary FaaS platforms are major impediments to gaining a deeper understanding of FaaS workflow platforms. While several works characterize FaaS platforms to derive such insights, there is a lack of a principled and rigorous study for FaaS workflow platforms, which have unique scaling, performance and costing behavior influenced by the platform design, dataflow and workloads. In this article, we perform extensive evaluations of three popular FaaS workflow platforms from AWS and Azure, running 25 micro-benchmark and application workflows over 132k invocations. Our detailed analysis confirms some conventional wisdom but also uncovers unique insights on the function execution, workflow orchestration, inter-function interactions, cold-start scaling and monetary costs. Our observations help developers better configure and program these platforms, set performance and scalability expectations, and identify research gaps on enhancing the platforms.
翻译:函数即服务(FaaS)是一种流行的无服务器计算范式,用于开发可在公共云上弹性扩展的事件驱动函数。FaaS工作流(如AWS Step Functions和Azure Durable Functions)由FaaS函数(如AWS Lambda和Azure Functions)组合而成,用以构建实际应用。然而,工作流中函数间的复杂交互以及对专有FaaS平台内部机制的有限可见性,是深入理解FaaS工作流平台的主要障碍。尽管已有若干研究通过特性分析来揭示FaaS平台的运行机制,但针对FaaS工作流平台仍缺乏系统性的严谨研究——这些平台受其架构设计、数据流和工作负载的影响,呈现出独特的扩展性、性能和成本特征。本文对AWS和Azure的三大主流FaaS工作流平台进行了大规模评估,通过13.2万次调用运行了25个微基准测试与应用工作流。我们的详细分析不仅验证了部分传统认知,更揭示了关于函数执行、工作流编排、函数间交互、冷启动扩展及货币成本等方面的独到见解。这些发现有助于开发者优化平台配置与编程实践,建立合理的性能与扩展性预期,并为平台增强研究指明方向。