This paper focuses on developing modern, efficient, lightweight models for dense predictions while trading off parameters, FLOPs, and performance. Inverted Residual Block (IRB) serves as the infrastructure for lightweight CNNs, but no counterpart has been recognized by attention-based studies. This work rethinks lightweight infrastructure from efficient IRB and effective components of Transformer from a unified perspective, extending CNN-based IRB to attention-based models and abstracting a one-residual Meta Mobile Block (MMB) for lightweight model design. Following simple but effective design criterion, we deduce a modern Inverted Residual Mobile Block (iRMB) and build a ResNet-like Efficient MOdel (EMO) with only iRMB for down-stream tasks. Extensive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, e.g., EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass equal-order CNN-/Attention-based models, while trading-off the parameter, efficiency, and accuracy well: running 2.8-4.0x faster than EdgeNeXt on iPhone14.
翻译:本文聚焦于开发面向密集预测任务的现代、高效、轻量级模型,在参数、FLOPs与性能之间进行权衡。倒置残差模块作为轻量级卷积神经网络的基础架构,但注意力研究中尚未出现其对应的通用模块。本工作从统一视角重新审视轻量级基础架构,将基于CNN的IRB扩展至注意力模型,并抽象出用于轻量级模型设计的单残差元移动模块(MMB)。遵循简洁而高效的设计准则,我们推导出现代化倒置残差移动模块,并构建了仅由iRMB构成的类ResNet高效模型,用于下游任务。在ImageNet-1K、COCO2017和ADE20K基准上的大量实验表明,EMO在性能上优于现有最先进方法:例如,EMO-1M/2M/5M分别达到71.5、75.1和78.4的Top-1准确率,超越同等规模的CNN/注意力模型,同时很好地平衡了参数、效率与精度:在iPhone14上运行速度比EdgeNeXt快2.8-4.0倍。