Building efficient and effective generative models for neural network weights has been a research focus of significant interest that faces challenges posed by the high-dimensional weight spaces of modern neural networks and their symmetries. Several prior generative models are limited to generating partial neural network weights, particularly for larger models, such as ResNet and ViT. Those that do generate complete weights struggle with generation speed or require finetuning of the generated models. In this work, we present DeepWeightFlow, a Flow Matching model that operates directly in weight space to generate diverse and high-accuracy neural network weights for a variety of architectures, neural network sizes, and data modalities. The neural networks generated by DeepWeightFlow do not require fine-tuning to perform well and can scale to large networks. We apply Git Re-Basin and TransFusion for neural network canonicalization in the context of generative weight models to account for the impact of neural network permutation symmetries and to improve generation efficiency for larger model sizes. The generated networks excel at transfer learning, and ensembles of hundreds of neural networks can be generated in minutes, far exceeding the efficiency of diffusion-based methods. DeepWeightFlow models pave the way for more efficient and scalable generation of diverse sets of neural networks.
翻译:构建高效且有效的神经网络权重生成模型一直是备受关注的研究焦点,其面临现代神经网络高维权重空间及其对称性带来的挑战。先前的一些生成模型仅限于生成部分神经网络权重,特别是对于ResNet和ViT等较大模型。那些能够生成完整权重的模型则面临生成速度问题或需要对生成模型进行微调。在本工作中,我们提出DeepWeightFlow,这是一种直接在权重空间操作的流匹配模型,能够为多种架构、不同规模神经网络及数据模态生成多样化且高精度的神经网络权重。DeepWeightFlow生成的神经网络无需微调即可表现优异,并可扩展至大型网络。我们在生成式权重模型背景下应用Git Re-Basin和TransFusion进行神经网络规范化,以应对神经网络置换对称性的影响,并提升较大规模模型的生成效率。所生成的神经网络在迁移学习中表现卓越,数百个神经网络的集成体可在数分钟内生成,其效率远超基于扩散的方法。DeepWeightFlow模型为更高效、可扩展地生成多样化神经网络集合开辟了新途径。