Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: (1) model parallelism (MP) and (2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the number of sub-models. In addition, both solutions are fault intolerant, an issue when deployed on edge devices. We propose variant parallelism (VP), an ensemble-based deep learning distribution method where different variants of a main model are generated and can be deployed on separate machines. We design a family of lighter models around the original model, and train them simultaneously to improve accuracy over single models. Our experimental results on six common mid-sized object recognition datasets demonstrate that our models can have 5.8-7.1x fewer parameters, 4.3-31x fewer multiply-accumulations (MACs), and 2.5-13.2x less response time on atomic inputs compared to MobileNetV2 while achieving comparable or higher accuracy. Our technique easily generates several variants of the base architecture. Each variant returns only 2k outputs 1 <= k <= (#classes/2), representing Top-k classes, instead of tons of floating point values required in MP. Since each variant provides a full-class prediction, our approach maintains higher availability compared with MP and CP in presence of failure.
翻译:为满足资源受限物联网设备上模型分布的实时推理限制,通常采用两种主要技术:(1)模型并行(MP)和(2)类别并行(CP)。在MP中,设备间传输庞大的中间数据(数量级远超输入)带来巨大通信开销。尽管CP解决了该问题,但其子模型数量存在限制。此外,这两种方案均缺乏容错能力,在边缘设备部署时易引发问题。我们提出变体并行(VP)——一种基于集成学习的深度学习分布方法,其中生成主模型的不同变体并部署于独立机器上。我们围绕原始模型设计一系列轻量级模型族,通过同步训练提升单模型准确率。在六个常见中型目标识别数据集上的实验表明,与MobileNetV2相比,我们的模型参数可减少5.8-7.1倍,乘加运算(MACs)减少4.3-31倍,原子输入响应时间缩短2.5-13.2倍,同时保持相当或更高的准确率。该技术可轻松生成基础架构的多个变体。每个变体仅返回2k个输出(1≤k≤(类别数/2))表示Top-k类,而非MP所需的大量浮点数值。由于每个变体提供完整类别预测,我们的方法在故障发生时相较于MP和CP能维持更高的可用性。