The Serverless Computing is becoming increasingly popular due to its ease of use and fine-grained billing. These features make it appealing for stateful application or serverless workflow. However, current serverless workflow systems utilize a controlflow-based invocation pattern to invoke functions. In this execution pattern, the function invocation depends on the state of the function. A function can only begin executing once all its precursor functions have completed. As a result, this pattern may potentially lead to longer end-to-end execution time. We design and implement the DFlow, a novel dataflow-based serverless workflow system that achieves high performance for serverless workflow. DFlow introduces a distributed scheduler (DScheduler) by using the dataflow-based invocation pattern to invoke functions. In this pattern, the function invocation depends on the data dependency between functions. The function can start to execute even its precursor functions are still running. DFlow further features a distributed store (DStore) that utilizes effective fine-grained optimization techniques to eliminate function interaction, thereby enabling efficient data exchange. With the support of DScheduler and DStore, DFlow can achieving an average improvement of 60% over CFlow, 40% over FaaSFlow, 25% over FaasFlowRedis, and 40% over KNIX on 99%-ile latency respectively. Further, it can improve network bandwidth utilization by 2x-4x over CFlow and 1.5x-3x over FaaSFlow, FaaSFlowRedis and KNIX, respectively. DFlow effectively reduces the cold startup latency, achieving an average improvement of 5.6x over CFlow and 1.1x over FaaSFlow
翻译:无服务器计算因其易用性和细粒度计费而日益流行,这些特性使其适用于有状态应用或无服务器工作流。然而,当前的无服务器工作流系统采用基于控制流的调用模式来调用函数。在这种执行模式中,函数调用依赖于函数的状态,一个函数只能在其所有前驱函数完成后才能开始执行。因此,这种模式可能导致端到端执行时间延长。我们设计并实现了DFlow,一种新型的基于数据流的无服务器工作流系统,可为无服务器工作流实现高性能。DFlow通过引入分布式调度器(DScheduler),利用基于数据流的调用模式调用函数。在此模式中,函数调用依赖于函数间的数据依赖关系,函数可以在其前驱函数仍在运行时就开始执行。DFlow还进一步具备分布式存储(DStore),利用有效的细粒度优化技术消除函数交互,从而实现高效的数据交换。在DScheduler和DStore的支持下,DFlow在99%尾延迟上平均比CFlow提升60%、比FaaSFlow提升40%、比FaaSFlowRedis提升25%、比KNIX提升40%。此外,它还能将网络带宽利用率相对于CFlow提高2-4倍,相对于FaaSFlow、FaaSFlowRedis和KNIX提高1.5-3倍。DFlow有效降低了冷启动延迟,平均比CFlow提升5.6倍,比FaaSFlow提升1.1倍。