Variant Parallelism: Lightweight Deep Convolutional Models for Distributed Inference on IoT Devices

Two major techniques are commonly used to meet real-time inference limitations when distributing models across resource-constrained IoT devices: (1) model parallelism (MP) and (2) class parallelism (CP). In MP, transmitting bulky intermediate data (orders of magnitude larger than input) between devices imposes huge communication overhead. Although CP solves this problem, it has limitations on the number of sub-models. In addition, both solutions are fault intolerant, an issue when deployed on edge devices. We propose variant parallelism (VP), an ensemble-based deep learning distribution method where different variants of a main model are generated and can be deployed on separate machines. We design a family of lighter models around the original model, and train them simultaneously to improve accuracy over single models. Our experimental results on six common mid-sized object recognition datasets demonstrate that our models can have 5.8-7.1x fewer parameters, 4.3-31x fewer multiply-accumulations (MACs), and 2.5-13.2x less response time on atomic inputs compared to MobileNetV2 while achieving comparable or higher accuracy. Our technique easily generates several variants of the base architecture. Each variant returns only 2k outputs 1 <= k <= (#classes/2), representing Top-k classes, instead of tons of floating point values required in MP. Since each variant provides a full-class prediction, our approach maintains higher availability compared with MP and CP in presence of failure.

翻译：为满足资源受限物联网设备上模型分布的实时推理限制，通常采用两种主要技术：（1）模型并行（MP）和（2）类别并行（CP）。在MP中，设备间传输庞大的中间数据（数量级远超输入）带来巨大通信开销。尽管CP解决了该问题，但其子模型数量存在限制。此外，这两种方案均缺乏容错能力，在边缘设备部署时易引发问题。我们提出变体并行（VP）——一种基于集成学习的深度学习分布方法，其中生成主模型的不同变体并部署于独立机器上。我们围绕原始模型设计一系列轻量级模型族，通过同步训练提升单模型准确率。在六个常见中型目标识别数据集上的实验表明，与MobileNetV2相比，我们的模型参数可减少5.8-7.1倍，乘加运算（MACs）减少4.3-31倍，原子输入响应时间缩短2.5-13.2倍，同时保持相当或更高的准确率。该技术可轻松生成基础架构的多个变体。每个变体仅返回2k个输出（1≤k≤(类别数/2)）表示Top-k类，而非MP所需的大量浮点数值。由于每个变体提供完整类别预测，我们的方法在故障发生时相较于MP和CP能维持更高的可用性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日