We present the latest generation of MobileNets, known as MobileNetV4 (MNv4), featuring universally efficient architecture designs for mobile devices. At its core, we introduce the Universal Inverted Bottleneck (UIB) search block, a unified and flexible structure that merges Inverted Bottleneck (IB), ConvNext, Feed Forward Network (FFN), and a novel Extra Depthwise (ExtraDW) variant. Alongside UIB, we present Mobile MQA, an attention block tailored for mobile accelerators, delivering a significant 39% speedup. An optimized neural architecture search (NAS) recipe is also introduced which improves MNv4 search effectiveness. The integration of UIB, Mobile MQA and the refined NAS recipe results in a new suite of MNv4 models that are mostly Pareto optimal across mobile CPUs, DSPs, GPUs, as well as specialized accelerators like Apple Neural Engine and Google Pixel EdgeTPU - a characteristic not found in any other models tested. Finally, to further boost accuracy, we introduce a novel distillation technique. Enhanced by this technique, our MNv4-Hybrid-Large model delivers 87% ImageNet-1K accuracy, with a Pixel 8 EdgeTPU runtime of just 3.8ms.
翻译:我们提出最新一代MobileNets,即MobileNetV4(MNv4),该模型专为移动设备设计了具有普适高效性的架构。其核心创新在于引入通用倒置瓶颈(UIB)搜索模块,这是一种统一且灵活的结构,融合了倒置瓶颈(IB)、ConvNext、前馈网络(FFN)以及新型扩展深度可分离卷积(ExtraDW)变体。除UIB外,我们还提出了Mobile MQA——专为移动端加速器设计的注意力模块,可实现高达39%的加速效果。同时,一项优化的神经架构搜索(NAS)方案也被引入,显著提升了MNv4的搜索效率。通过整合UIB、Mobile MQA及精炼的NAS方案,我们构建了全新的MNv4模型系列。该系列在移动CPU、DSP、GPU以及苹果神经引擎、谷歌Pixel EdgeTPU等专用加速器上均展现出帕累托最优性能——这一特性在现有其他测试模型中尚属空白。为进一步提升精度,我们提出了一种新型蒸馏技术。经此技术增强后,我们的MNv4-Hybrid-Large模型在ImageNet-1K上达到87%的准确率,同时Pixel 8 EdgeTPU运行时仅需3.8毫秒。