Existing serverless workflow orchestration systems are predominantly designed for a single-cloud FaaS system, leading to vendor lock-in. This restricts performance optimization, cost reduction, and availability of applications. However, orchestrating serverless workflows on Jointcloud FaaS systems faces two main challenges: (1) additional overhead caused by centralized cross-cloud orchestration; and (2) a lack of reliable failover and fault-tolerant mechanisms for cross-cloud serverless workflows. To address these challenges, we propose Joint$λ$, a distributed runtime system designed to orchestrate serverless workflows on multiple FaaS systems without relying on a centralized orchestrator. Joint$λ$ introduces a compatibility layer, Backend-Shim, leveraging inter-cloud heterogeneity to optimize makespan and reduce costs with on-demand billing. By using function-side orchestration instead of centralized nodes, it enables independent function invocations and data transfers, reducing cross-cloud communication overhead. For high availability, it ensures exactly-once execution via datastores and failover mechanisms for serverless workflows on Jointcloud FaaS systems. We validate Joint$λ$ on two heterogeneous FaaS systems, AWS and Aliyun, with four workflows. Compared to the most advanced commercial orchestration services for single-cloud serverless workflows, Joint$λ$ reduces makespan by up to 3.3$\times$ while saving up to 65% in cost. Joint$λ$ is also up to 4.0$\times$ faster than state-of-the-art orchestrators for cross-cloud serverless workflows, while achieving competitive cost in representative scenarios and providing strong execution guarantees.
翻译:摘要:现有的无服务器工作流编排系统主要针对单云FaaS系统设计,导致供应商锁定问题。这限制了应用的性能优化、成本降低和可用性。然而,在联合云FaaS系统上编排无服务器工作流面临两大挑战:(1)集中式跨云编排带来的额外开销;(2)缺乏可靠的跨云无服务器工作流故障转移与容错机制。为应对这些挑战,我们提出Joint$λ$,一种无需依赖集中式编排器即可在多个FaaS系统上编排无服务器工作流的分布式运行时系统。Joint$λ$引入兼容层Backend-Shim,利用跨云异构性优化完工时间,并通过按需计费降低成本。通过采用函数端编排替代集中式节点,系统实现独立的函数调用与数据传输,从而降低跨云通信开销。在高可用性方面,系统借助数据存储和故障转移机制,确保联合云FaaS系统上无服务器工作流的恰好一次执行。我们在AWS和阿里云两个异构FaaS系统上,通过四个工作流验证了Joint$λ$的性能。与单云无服务器工作流领域最先进的商业编排服务相比,Joint$λ$将完工时间降低至多3.3倍,同时节省高达65%的成本。相较于现有跨云无服务器工作流编排器,Joint$λ$的速度提升至多4.0倍,且在代表性场景中实现具有竞争力的成本,并提供强执行保证。