OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning

Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives to modern convolutional neural networks (CNNs) for centralized training, the unprecedented size and higher computational demands hinder their deployment on resource-constrained edge devices, challenging their widespread application in FL. Since client devices in FL typically have limited computing resources and communication bandwidth, models intended for such devices must strike a balance between model size, computational efficiency, and the ability to adapt to the diverse and non-IID data distributions encountered in FL. To address these challenges, we propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources. Our models incorporate image-specific inductive biases through the LCT tokenizer by leveraging efficient depthwise separable convolutions in residual linear bottleneck blocks to extract local features, while the multi-head self-attention (MHSA) mechanism in the LCT encoder implicitly facilitates capturing global representations of images. Extensive experiments on benchmark image datasets indicate that our models outperform existing lightweight vision models while having fewer parameters and lower computational demands, making them suitable for FL scenarios with data heterogeneity and communication bottlenecks.

翻译：联邦学习（FL）已成为一种在多个边缘设备上协作训练机器学习模型同时保护隐私的可行方案。FL的成功取决于参与模型的效率及其应对分布式学习特有挑战的能力。尽管多种Vision Transformer（ViT）变体在集中式训练中展现出作为现代卷积神经网络（CNN）替代方案的巨大潜力，但其空前的规模和较高的计算需求阻碍了其在资源受限的边缘设备上的部署，限制了它们在FL中的广泛应用。由于FL中的客户端设备通常计算资源有限且通信带宽受限，为此类设备设计的模型必须在模型规模、计算效率及适应FL中多样化非独立同分布数据分布的能力之间取得平衡。为应对这些挑战，我们提出OnDev-LCT：适用于训练数据和资源受限的设备端视觉任务的轻量级卷积Transformer。我们的模型通过LCT分词器引入图像特定归纳偏置，利用残差线性瓶颈块中的高效深度可分离卷积提取局部特征，同时LCT编码器中的多头自注意力（MHSA）机制隐式促进图像全局表征的捕捉。在基准图像数据集上的大量实验表明，我们的模型在参数更少、计算需求更低的情况下优于现有轻量级视觉模型，使其适用于存在数据异构性和通信瓶颈的FL场景。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

Google研究院提出FixMatch，简单粗暴却极其有效的半监督学习方法，附14页PDF下载

专知会员服务

54+阅读 · 2020年1月24日