The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their computing/communication capacity and prone to crash or timeout failures. In this paper, we present RoCoIn, a robust cooperative inference mechanism for locally distributed execution of deep neural network-based inference tasks over heterogeneous edge devices. It creates a set of independent and compact student models that are learned from a large model using knowledge distillation for distributed deployment. In particular, the devices are strategically grouped to redundantly deploy and execute the same student model such that the inference process is resilient to any local failures, while a joint knowledge partition and student model assignment scheme are designed to minimize the response latency of the distributed inference system in the presence of devices with diverse capacities. Extensive simulations are conducted to corroborate the superior performance of our RoCoIn for distributed inference compared to several baselines, and the results demonstrate its efficacy in timely inference and failure resiliency.
翻译:分布式推理范式使得计算工作负载能够分布在多个设备上,促进了基于深度学习的智能服务在资源极度受限的物联网(IoT)场景中的实现。然而,在依赖一群计算/通信能力异构且易发生崩溃或超时故障的物联网设备上执行复杂的推理任务,带来了巨大挑战。本文提出了RoCoIn,一种用于在异构边缘设备上本地分布式执行基于深度神经网络的推理任务的鲁棒协同推理机制。它通过知识蒸馏从一个大模型中学习出一组独立且紧凑的学生模型,用于分布式部署。具体而言,设备被策略性地分组,以冗余部署和执行相同的学生模型,从而使推理过程对任何本地故障具有弹性。同时,设计了一种联合知识划分与学生模型分配方案,以在设备能力各异的情况下最小化分布式推理系统的响应延迟。我们进行了广泛的仿真,以验证我们的RoCoIn在分布式推理方面相较于多个基线方法的优越性能,结果证明了其在及时推理和故障弹性方面的有效性。