Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior researches adopt the control-flow paradigm to orchestrate a serverless workflow. However, the control-flow paradigm inherently results in long response latency, due to the heavy data persistence overhead, sequential resource usage, and late function triggering. Our investigation shows that the data-flow paradigm has the potential to resolve the above problems, with careful design and optimization. We propose DataFlower, a scheme that achieves the data-flow paradigm for serverless workflows. In DataFlower, a container is abstracted to be a function logic unit and a data logic unit. The function logic unit runs the functions, and the data logic unit handles the data transmission asynchronously. Moreover, a host-container collaborative communication mechanism is used to support efficient data transfer. Our experimental results show that compared to state-of-the-art serverless designs, DataFlower reduces the 99\%-ile latency of the benchmarks by up to 35.4\%, and improves the peak throughput by up to 3.8X.
翻译:无服务器计算通过自动扩缩容执行函数,是云原生时代一种流行的任务执行模式。通过将无服务器函数连接成工作流,租户可以实现复杂功能。先前的研究采用控制流范式来编排无服务器工作流。然而,控制流范式因数据持久化开销大、资源使用顺序化以及函数触发延迟高等固有缺陷,会导致较高的响应延迟。我们的研究表明,通过精心设计与优化,数据流范式有望解决上述问题。我们提出DataFlower,一种在无服务器工作流中实现数据流范式的方案。在DataFlower中,容器被抽象为函数逻辑单元和数据逻辑单元。函数逻辑单元负责运行函数,数据逻辑单元则异步处理数据传输。此外,还采用宿主机-容器协同通信机制来支持高效的数据传输。实验结果表明,与最先进的无服务器设计相比,DataFlower将基准测试的99%分位延迟降低了最高35.4%,并将峰值吞吐量提升了最高3.8倍。