KheOps: Cost-effective Repeatability, Reproducibility, and Replicability of Edge-to-Cloud Experiments

Distributed infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex scientific workflows to be executed across hybrid systems spanning from IoT Edge devices to Clouds, and sometimes to supercomputers (the Computing Continuum). Understanding the performance trade-offs of large-scale workflows deployed on such complex Edge-to-Cloud Continuum is challenging. To achieve this, one needs to systematically perform experiments, to enable their reproducibility and allow other researchers to replicate the study and the obtained conclusions on different infrastructures. This breaks down to the tedious process of reconciling the numerous experimental requirements and constraints with low-level infrastructure design choices.To address the limitations of the main state-of-the-art approaches for distributed, collaborative experimentation, such as Google Colab, Kaggle, and Code Ocean, we propose KheOps, a collaborative environment specifically designed to enable cost-effective reproducibility and replicability of Edge-to-Cloud experiments. KheOps is composed of three core elements: (1) an experiment repository; (2) a notebook environment; and (3) a multi-platform experiment methodology.We illustrate KheOps with a real-life Edge-to-Cloud application. The evaluations explore the point of view of the authors of an experiment described in an article (who aim to make their experiments reproducible) and the perspective of their readers (who aim to replicate the experiment). The results show how KheOps helps authors to systematically perform repeatable and reproducible experiments on the Grid5000 + FIT IoT LAB testbeds. Furthermore, KheOps helps readers to cost-effectively replicate authors experiments in different infrastructures such as Chameleon Cloud + CHI@Edge testbeds, and obtain the same conclusions with high accuracies (> 88% for all performance metrics).

翻译：用于计算和分析的分布式基础架构正逐渐演变为一个互联生态系统，支持复杂科学工作流跨越从物联网边缘设备到云端、有时甚至到超级计算机（计算连续体）的混合系统执行。理解部署在此类复杂边缘到云连续体上的大规模工作流的性能权衡具有挑战性。为此，需要系统性地开展实验，确保其可重复性，并允许其他研究人员在不同基础架构上复现研究及所获结论。这通常需繁琐地协调众多实验需求与约束条件及低层基础架构设计选择。为突破当前主流分布式协作实验方法（如Google Colab、Kaggle和Code Ocean）的局限性，我们提出KheOps——一个专为实现边缘到云实验的经济高效可重复性与可复现性而设计的协作环境。KheOps包含三大核心要素：(1)实验仓库；(2)笔记本环境；(3)多平台实验方法论。我们通过一个真实边缘到云应用演示了KheOps。评估从文章所描述实验的作者视角（旨在使其实验可重复）与读者视角（旨在复现实验）展开。结果表明，KheOps有助于作者在Grid5000 + FIT IoT LAB测试床上系统性地开展可重复与可再现实验。此外，KheOps帮助读者在Chameleon Cloud + CHI@Edge等不同基础架构上经济高效地复现作者实验，并以高准确率（所有性能指标超过88%）获得相同结论。