The public model zoo containing enormous powerful pretrained model families (e.g., ResNet/DeiT) has reached an unprecedented scope than ever, which significantly contributes to the success of deep learning. As each model family consists of pretrained models with diverse scales (e.g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime. To this end, we present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment which cheaply produces numerous networks with different complexity and performance trade-offs given a family of pretrained neural networks, which we call anchors. Specifically, SN-Net splits the anchors across the blocks/layers and then stitches them together with simple stitching layers to map the activations from one anchor to another. With only a few epochs of training, SN-Net effectively interpolates between the performance of anchors with varying scales. At runtime, SN-Net can instantly adapt to dynamic resource constraints by switching the stitching positions. Extensive experiments on ImageNet classification demonstrate that SN-Net can obtain on-par or even better performance than many individually trained networks while supporting diverse deployment scenarios. For example, by stitching Swin Transformers, we challenge hundreds of models in Timm model zoo with a single network. We believe this new elastic model framework can serve as a strong baseline for further research in wider communities.
翻译:公共模型库中包含了前所未有的、规模庞大的预训练模型家族(例如ResNet/DeiT),这极大地推动了深度学习的发展。由于每个模型家族都包含不同尺度的预训练模型(例如DeiT-Ti/S/B),一个基本问题自然产生:如何高效整合同一家族中这些现成可用的模型,以在运行时实现动态的精度-效率平衡。为此,我们提出可缝合神经网络(Stitchable Neural Networks, SN-Net),这是一种新颖的可扩展且高效的模型部署框架,能够基于给定的预训练神经网络家族(我们称之为锚点),以低成本生成大量具有不同复杂度与性能权衡的网络。具体而言,SN-Net沿网络块或层拆分锚点,并通过简单的缝合层将其拼接,从而将一个锚点的激活映射到另一个锚点。仅需少量训练周期,SN-Net即可有效插值不同尺度锚点间的性能。在运行时,SN-Net可通过切换缝合位置即时适应动态资源约束。在ImageNet分类上的大量实验表明,SN-Net在支持多样化部署场景的同时,其性能可与甚至超越许多独立训练的网络。例如,通过缝合Swin Transformer,我们仅用单个网络便挑战了Timm模型库中的数百个模型。我们相信,这种新型弹性模型框架可作为更广泛社区后续研究的强有力基线。