Traditional initialisation methods, e.g. He and Xavier, have been effective in avoiding the problem of vanishing or exploding gradients in neural networks. However, they only use simple pointwise distributions, which model one-dimensional variables. Moreover, they ignore most information about the architecture and disregard past training experiences. These limitations can be overcome by employing generative models for initialisation. In this paper, we introduce two groups of new initialisation methods. First, we locally initialise weight groups by employing variational autoencoders. Secondly, we globally initialise full weight sets by employing graph hypernetworks. We thoroughly evaluate the impact of the employed generative models on state-of-the-art neural networks in terms of accuracy, convergence speed and ensembling. Our results show that global initialisations result in higher accuracy and faster initial convergence speed. However, the implementation through graph hypernetworks leads to diminished ensemble performance on out of distribution data. To counteract, we propose a modification called noise graph hypernetwork, which encourages diversity in the produced ensemble members. Furthermore, our approach might be able to transfer learned knowledge to different image distributions. Our work provides insights into the potential, the trade-offs and possible modifications of these new initialisation methods.
翻译:传统的初始化方法(例如He和Xavier方法)在避免神经网络梯度消失或爆炸问题方面已证明有效,但它们仅使用模拟一维变量的简单点状分布。此外,这些方法忽略了架构的大部分信息,也未考虑过往的训练经验。通过采用生成模型进行初始化可克服上述局限。本文提出两类新型初始化方法:首先,利用变分自编码器对权重组进行局部初始化;其次,采用图超网络对完整权重集进行全局初始化。我们从准确率、收敛速度和集成性能三个维度,全面评估了所采用生成模型对最新神经网络的影响。结果表明,全局初始化可带来更高准确率和更快的初始收敛速度,但通过图超网络实现的方案会导致模型在分布外数据上的集成性能下降。为应对这一问题,我们提出名为噪声图超网络的改进方案,该方案可增强所生成集成成员的多样性。此外,我们的方法可能具备将已学知识迁移至不同图像分布的能力。本研究为这些新型初始化方法的潜力、权衡关系及可能改进方向提供了深入见解。