VanillaNet: the Power of Minimalism in Deep Learning

At the heart of foundation models is the philosophy of "more is different", exemplified by the astonishing success in computer vision and natural language processing. However, the challenges of optimization and inherent complexity of transformer models call for a paradigm shift towards simplicity. In this study, we introduce VanillaNet, a neural network architecture that embraces elegance in design. By avoiding high depth, shortcuts, and intricate operations like self-attention, VanillaNet is refreshingly concise yet remarkably powerful. Each layer is carefully crafted to be compact and straightforward, with nonlinear activation functions pruned after training to restore the original architecture. VanillaNet overcomes the challenges of inherent complexity, making it ideal for resource-constrained environments. Its easy-to-understand and highly simplified architecture opens new possibilities for efficient deployment. Extensive experimentation demonstrates that VanillaNet delivers performance on par with renowned deep neural networks and vision transformers, showcasing the power of minimalism in deep learning. This visionary journey of VanillaNet has significant potential to redefine the landscape and challenge the status quo of foundation model, setting a new path for elegant and effective model design. Pre-trained models and codes are available at https://github.com/huawei-noah/VanillaNet and https://gitee.com/mindspore/models/tree/master/research/cv/vanillanet.

翻译：基础模型的核心是“多即不同”的哲学，这在计算机视觉和自然语言处理领域的显著成功中得以体现。然而，Transformer模型的优化挑战与内在复杂性呼唤向简洁性转型的研究范式。本研究提出VanillaNet，一种拥抱优雅设计的神经网络架构。通过避免高深度、捷径操作及自注意力等复杂机制，VanillaNet以清新简洁的结构展现出惊人性能。其每个层均经精心设计而紧凑直接，并在训练后修剪非线性激活函数以恢复原始架构。VanillaNet克服了内在复杂性难题，成为资源受限环境的理想选择。该架构易于理解且高度简化，为高效部署开辟新可能。大量实验表明，VanillaNet的性能可与著名深度神经网络及视觉Transformer相媲美，展现了深度学习中的极简主义力量。VanillaNet的前瞻性探索有望重新定义基础模型格局，挑战现有范式，为优雅高效的模型设计开辟新路径。预训练模型与代码已开源至https://github.com/huawei-noah/VanillaNet 及 https://gitee.com/mindspore/models/tree/master/research/cv/vanillanet。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/