Existing serverless workflow orchestration systems are predominantly designed for a single-cloud FaaS system, leading to vendor lock-in. This restricts performance optimization, cost reduction, and availability of applications. However, orchestrating serverless workflows on Jointcloud FaaS systems faces two main challenges: (1) additional overhead caused by centralized cross-cloud orchestration; and (2) a lack of reliable failover and fault-tolerant mechanisms for cross-cloud serverless workflows. To address these challenges, we propose Joint$λ$, a distributed runtime system designed to orchestrate serverless workflows on multiple FaaS systems without relying on a centralized orchestrator. Joint$λ$ introduces a compatibility layer, Backend-Shim, leveraging inter-cloud heterogeneity to optimize makespan and reduce costs with on-demand billing. By using function-side orchestration instead of centralized nodes, it enables independent function invocations and data transfers, reducing cross-cloud communication overhead. For high availability, it ensures exactly-once execution via datastores and failover mechanisms for serverless workflows on Jointcloud FaaS systems. We validate Joint$λ$ on two heterogeneous FaaS systems, AWS and Aliyun, with four workflows. Compared to the most advanced commercial orchestration services for single-cloud serverless workflows, Joint$λ$ reduces makespan by up to 3.3$\times$ while saving up to 65% in cost. Joint$λ$ is also up to 4.0$\times$ faster than state-of-the-art orchestrators for cross-cloud serverless workflows, while achieving competitive cost in representative scenarios and providing strong execution guarantees.
翻译:现有无服务器工作流编排系统主要针对单一云FaaS系统设计,导致供应商锁定问题。这限制了应用的性能优化、成本降低和可用性提升。然而,在联合云FaaS系统上编排无服务器工作流面临两大挑战:(1)集中式跨云编排带来的额外开销;(2)缺乏可靠的跨云无服务器工作流故障转移与容错机制。针对这些挑战,我们提出Joint$λ$——一种无需依赖集中式编排器的分布式运行时系统,可在多个FaaS系统上编排无服务器工作流。Joint$λ$引入兼容层Backend-Shim,利用跨云异构性优化完工时间,并通过按需计费降低开销。通过采用函数侧编排替代集中式节点,系统实现独立的函数调用与数据传输,降低跨云通信开销。为保障高可用性,系统通过数据存储与故障转移机制确保联合云FaaS系统上无服务器工作流的恰好一次执行。我们在AWS和阿里云两个异构FaaS系统中,使用四个工作流验证了Joint$λ$。与面向单云无服务器工作流的最先进商业编排服务相比,Joint$λ$将完工时间降低至1/3.3倍,同时节省高达65%的开销。在代表性场景中,Joint$λ$相比最先进的跨云无服务器工作流编排器提速至4倍,且具备竞争性成本与强执行保证。