As Internet of Things (IoT) technology advances, end devices like sensors and smartphones are progressively equipped with AI models tailored to their local memory and computational constraints. Local inference reduces communication costs and latency; however, these smaller models typically underperform compared to more sophisticated models deployed on edge servers or in the cloud. Cooperative Inference Systems (CISs) address this performance trade-off by enabling smaller devices to offload part of their inference tasks to more capable devices. These systems often deploy hierarchical models that share numerous parameters, exemplified by Deep Neural Networks (DNNs) that utilize strategies like early exits or ordered dropout. In such instances, Federated Learning (FL) may be employed to jointly train the models within a CIS. Yet, traditional training methods have overlooked the operational dynamics of CISs during inference, particularly the potential high heterogeneity in serving rates across clients. To address this gap, we propose a novel FL approach designed explicitly for use in CISs that accounts for these variations in serving rates. Our framework not only offers rigorous theoretical guarantees, but also surpasses state-of-the-art (SOTA) training algorithms for CISs, especially in scenarios where inference request rates or data availability are uneven among clients.
翻译:随着物联网(IoT)技术的进步,传感器和智能手机等终端设备逐渐配备了适应其本地内存与计算限制的定制化人工智能模型。本地推理虽能降低通信成本与延迟,但与部署在边缘服务器或云端更复杂的模型相比,这些小型模型通常性能欠佳。协同推理系统(CISs)通过使能力较弱的设备将部分推理任务卸载至性能更强的设备,以应对这种性能权衡。此类系统通常部署共享大量参数的层次化模型,例如采用早期退出或有序丢弃策略的深度神经网络(DNNs)。在此类场景下,可采用联邦学习(FL)对CIS内的模型进行联合训练。然而,传统训练方法忽视了CIS在推理过程中的运行动态,尤其是客户端间服务率的潜在高度异质性。为填补这一空白,我们提出一种专为CIS设计的新型FL方法,该方法明确考虑了服务率的差异。我们的框架不仅提供了严格的理论保证,而且在客户端间推理请求率或数据可用性不均衡的场景下,其性能超越了当前最先进的CIS训练算法。