Hypernetworks, neural networks that predict the parameters of another neural network, are powerful models that have been successfully used in diverse applications from image generation to multi-task learning. Unfortunately, existing hypernetworks are often challenging to train. Training typically converges far more slowly than for non-hypernetwork models, and the rate of convergence can be very sensitive to hyperparameter choices. In this work, we identify a fundamental and previously unidentified problem that contributes to the challenge of training hypernetworks: a magnitude proportionality between the inputs and outputs of the hypernetwork. We demonstrate both analytically and empirically that this can lead to unstable optimization, thereby slowing down convergence, and sometimes even preventing any learning. We present a simple solution to this problem using a revised hypernetwork formulation that we call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposed solution on several hypernetwork tasks, where it consistently stabilizes training and achieves faster convergence. Furthermore, we perform a comprehensive ablation study including choices of activation function, normalization strategies, input dimensionality, and hypernetwork architecture; and find that MIP improves training in all scenarios. We provide easy-to-use code that can turn existing networks into MIP-based hypernetworks.
翻译:超网络是一种预测另一个神经网络参数的神经网络,已成为从图像生成到多任务学习等多样化应用中成功运用的强大模型。然而,现有超网络往往难以训练,其收敛速度通常远慢于非超网络模型,且收敛速率对超参数选择极为敏感。在本工作中,我们识别出一个先前未被发现且导致超网络训练困难的根本性问题:超网络输入与输出之间存在幅度比例关系。我们通过理论分析和实验验证表明,这种比例关系会导致优化不稳定,从而减缓收敛速度,甚至完全阻碍学习进程。针对该问题,我们提出一种基于修订的超网络公式的简单解决方案,称为幅度不变参数化(Magnitude Invariant Parametrizations, MIP)。我们在多个超网络任务上验证了该方案,结果表明其能持续稳定训练并实现更快收敛。此外,我们开展了全面的消融研究,涵盖激活函数选择、归一化策略、输入维度及超网络架构等变量,发现MIP在所有场景下均能改善训练效果。我们提供了易于使用的代码,可将现有网络转化为基于MIP的超网络。