Deploying neural networks to different devices or platforms is in general challenging, especially when the model size is large or model complexity is high. Although there exist ways for model pruning or distillation, it is typically required to perform a full round of model training or finetuning procedure in order to obtain a smaller model that satisfies the model size or complexity constraints. Motivated by recent works on dynamic neural networks, we propose a simple way to train a large network and flexibly extract a subnetwork from it given a model size or complexity constraint during inference. We introduce a new way to allow a large model to be trained with dynamic depth and width during the training phase, and after the large model is trained we can select a subnetwork from it with arbitrary depth and width during the inference phase with a relatively better performance compared to training the subnetwork independently from scratch. Experiment results on a music source separation model show that our proposed method can effectively improve the separation performance across different subnetwork sizes and complexities with a single large model, and training the large model takes significantly shorter time than training all the different subnetworks.
翻译:将神经网络部署到不同设备或平台通常具有挑战性,尤其是当模型规模较大或复杂度较高时。尽管存在模型剪枝或蒸馏等方法,但通常需要执行完整的模型训练或微调流程才能获得满足模型规模或复杂度约束的较小模型。受近期动态神经网络相关研究的启发,我们提出一种简洁方法,可训练大型网络并在推理阶段根据模型规模或复杂度约束灵活提取子网络。我们引入新机制,允许大型模型在训练阶段以动态深度和宽度进行训练;训练完成后,可在推理阶段从中选择任意深度和宽度的子网络,其性能显著优于独立从头训练该子网络。在音乐源分离模型上的实验表明,该方法能够通过单一大模型有效提升不同子网络规模与复杂度下的分离性能,且训练大模型所需时间远短于训练所有不同子网络的总和。