Weight initialization plays an important role in neural network training. Widely used initialization methods are proposed and evaluated for networks that are trained from scratch. However, the growing number of pretrained models now offers new opportunities for tackling this classical problem of weight initialization. In this work, we introduce weight selection, a method for initializing smaller models by selecting a subset of weights from a pretrained larger model. This enables the transfer of knowledge from pretrained weights to smaller models. Our experiments demonstrate that weight selection can significantly enhance the performance of small models and reduce their training time. Notably, it can also be used together with knowledge distillation. Weight selection offers a new approach to leverage the power of pretrained models in resource-constrained settings, and we hope it can be a useful tool for training small models in the large-model era. Code is available at https://github.com/OscarXZQ/weight-selection.
翻译:权重初始化在神经网络训练中扮演着重要角色。现有的初始化方法主要针对从零训练的模型提出并评估。然而,预训练模型数量的不断增长为该经典问题提供了新的解决契机。本文提出了一种名为"权重选择"的方法,通过从预训练大模型中选取子集权重来初始化小模型,从而实现预训练知识向小模型的迁移。实验表明,权重选择能显著提升小模型性能并缩短训练时间。值得注意的是,该方法还可与知识蒸馏技术协同使用。权重选择为在资源受限场景中利用预训练模型提供了新思路,有望成为大模型时代训练小模型的有效工具。相关代码已开源至https://github.com/OscarXZQ/weight-selection。