Computing Continuum (CC) systems are challenged to ensure the intricate requirements of each computational tier. Given the system's scale, the Service Level Objectives (SLOs) which are expressed as these requirements, must be broken down into smaller parts that can be decentralized. We present our framework for collaborative edge intelligence enabling individual edge devices to (1) develop a causal understanding of how to enforce their SLOs, and (2) transfer knowledge to speed up the onboarding of heterogeneous devices. Through collaboration, they (3) increase the scope of SLO fulfillment. We implemented the framework and evaluated a use case in which a CC system is responsible for ensuring Quality of Service (QoS) and Quality of Experience (QoE) during video streaming. Our results showed that edge devices required only ten training rounds to ensure four SLOs; furthermore, the underlying causal structures were also rationally explainable. The addition of new types of devices can be done a posteriori, the framework allowed them to reuse existing models, even though the device type had been unknown. Finally, rebalancing the load within a device cluster allowed individual edge devices to recover their SLO compliance after a network failure from 22% to 89%.
翻译:计算连续统(CC)系统面临确保每个计算层级复杂需求的挑战。考虑到系统的规模,这些以服务等级目标(SLO)形式表达的需求必须分解为可分散的较小部分。我们提出了一个协作式边缘智能框架,使单个边缘设备能够(1)发展对如何执行其SLO的因果理解,以及(2)传输知识以加速异构设备的接入。通过协作,(3)它们扩大了SLO满足的范围。我们实现了该框架,并评估了一个用例,其中CC系统负责在视频流传输过程中确保服务质量(QoS)和体验质量(QoE)。结果表明,边缘设备仅需十轮训练即可确保四个SLO;此外,潜在的因果结构也是理性可解释的。新类型设备的添加可以事后进行,该框架允许它们复用现有模型,即使设备类型此前未知。最后,设备集群内的负载重平衡使单个边缘设备在网络故障后将其SLO合规性从22%恢复至89%。