Modern scientific workflows require hybrid infrastructures combining numerous decentralized resources on the IoT/Edge interconnected to Cloud/HPC systems (aka the Computing Continuum) to enable their optimized execution. Understanding and optimizing the performance of such complex Edge-to-Cloud workflows is challenging. Capturing the provenance of key performance indicators, with their related data and processes, may assist in understanding and optimizing workflow executions. However, the capture overhead can be prohibitive, particularly in resource-constrained devices, such as the ones on the IoT/Edge.To address this challenge, based on a performance analysis of existing systems, we propose ProvLight, a tool to enable efficient provenance capture on the IoT/Edge. We leverage simplified data models, data compression and grouping, and lightweight transmission protocols to reduce overheads. We further integrate ProvLight into the E2Clab framework to enable workflow provenance capture across the Edge-to-Cloud Continuum. This integration makes E2Clab a promising platform for the performance optimization of applications through reproducible experiments.We validate ProvLight at a large scale with synthetic workloads on 64 real-life IoT/Edge devices in the FIT IoT LAB testbed. Evaluations show that ProvLight outperforms state-of-the-art systems like ProvLake and DfAnalyzer in resource-constrained devices. ProvLight is 26 -- 37x faster to capture and transmit provenance data; uses 5 -- 7x less CPU; 2x less memory; transmits 2x less data; and consumes 2 -- 2.5x less energy. ProvLight and E2Clab are available as open-source tools.
翻译:摘要:现代科学工作流需要结合大量分散在物联网/边缘端并与云端/高性能计算系统(即计算连续体)互联的去中心化资源,以实现其优化执行。理解并优化此类复杂边缘-云工作流的性能极具挑战性。捕获关键性能指标及其相关数据与流程的溯源信息,有助于理解并优化工作流执行。然而,溯源捕获的开销可能极其高昂,尤其是在物联网/边缘端等资源受限设备上。针对这一挑战,基于对现有系统性能的分析,我们提出了ProvLight——一种在物联网/边缘端实现高效溯源捕获的工具。我们利用简化数据模型、数据压缩与分组以及轻量级传输协议来降低开销。进一步地,我们将ProvLight集成至E2Clab框架中,实现跨边缘-云连续体的工作流溯源捕获。这一集成使E2Clab成为一个有前景的平台,可通过可重复实验优化应用性能。我们在FIT IoT LAB测试平台的64个真实物联网/边缘设备上,使用合成工作负载进行了大规模验证。评估表明,在资源受限设备上,ProvLight的性能优于ProvLake和DfAnalyzer等现有系统。ProvLight捕获与传输溯源数据的速度快26–37倍;CPU使用量减少5–7倍;内存使用量减少2倍;数据传输量减少2倍;能耗降低2–2.5倍。ProvLight与E2Clab均已作为开源工具发布。