最优深度网络——根据数据集自适应调整模型深度以实现卓越效率 (Optimally Deep Networks -- Adapting Model Depth to Datasets for Superior Efficiency)

Deep neural networks (DNNs) have provided brilliant performance across various tasks. However, this success often comes at the cost of unnecessarily large model sizes, high computational demands, and substantial memory footprints. Typically, powerful architectures are trained at full depths but not all datasets or tasks require such high model capacity. Training very deep architectures on relatively low-complexity datasets frequently leads to wasted computation, unnecessary energy consumption, and excessive memory usage, which in turn makes deployment of models on resource-constrained devices impractical. To address this problem, we introduce Optimally Deep Networks (ODNs), which provide a balance between model depth and task complexity. Specifically, we propose a NAS like training strategy called progressive depth expansion, which begins by training deep networks at shallower depths and incrementally increases their depth as the earlier blocks converge, continuing this process until the target accuracy is reached. ODNs use only the optimal depth for the given datasets, removing redundant layers. This cuts down future training and inference costs, lowers the memory footprint, enhances computational efficiency, and facilitates deployment on edge devices. Empirical results show that the optimal depths of ResNet-18 and ResNet-34 for MNIST and SVHN, achieve up to 98.64 % and 96.44 % reduction in memory footprint, while maintaining a competitive accuracy of 99.31 % and 96.08 %, respectively.

翻译：深度神经网络（DNNs）在各种任务中展现出卓越性能。然而，这种成功往往伴随着不必要的庞大模型规模、高计算需求及显著的内存占用。通常，强大的架构会以完整深度进行训练，但并非所有数据集或任务都需要如此高的模型容量。在复杂度相对较低的数据集上训练极深的架构，常常导致计算资源浪费、不必要的能耗以及过度的内存使用，进而使得在资源受限设备上部署模型变得不切实际。为解决这一问题，我们提出了最优深度网络（ODNs），它在模型深度与任务复杂度之间实现了平衡。具体而言，我们提出了一种类似神经架构搜索（NAS）的训练策略，称为渐进深度扩展。该策略从训练较浅深度的网络开始，随着早期模块收敛逐步增加其深度，并持续此过程直至达到目标精度。ODNs仅使用针对给定数据集的最优深度，移除了冗余层。这降低了后续训练和推理成本，减少了内存占用，提升了计算效率，并促进了在边缘设备上的部署。实验结果表明，ResNet-18和ResNet-34在MNIST和SVHN数据集上的最优深度，分别实现了高达98.64%和96.44%的内存占用减少，同时保持了99.31%和96.08%的竞争性准确率。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日