We introduce a novel yet straightforward neural network initialization scheme that modifies conventional methods like Xavier and Kaiming initialization. Inspired by the concept of emergence and leveraging the emergence measures proposed by Li (2023), our method adjusts the layer-wise weight scaling factors to achieve higher emergence values. This enhancement is easy to implement, requiring no additional optimization steps for initialization compared to GradInit. We evaluate our approach across various architectures, including MLP and convolutional architectures for image recognition, and transformers for machine translation. We demonstrate substantial improvements in both model accuracy and training speed, with and without batch normalization. The simplicity, theoretical innovation, and demonstrable empirical advantages of our method make it a potent enhancement to neural network initialization practices. These results suggest a promising direction for leveraging emergence to improve neural network training methodologies. Code is available at: https://github.com/johnnyjingzeli/EmergenceInit.
翻译:我们提出了一种新颖而简洁的神经网络初始化方案,该方案对Xavier和Kaiming等传统初始化方法进行了改进。受涌现概念的启发,并利用Li(2023)提出的涌现度量指标,我们的方法通过调整逐层的权重缩放因子以获得更高的涌现值。这种增强易于实现,与GradInit相比,在初始化过程中无需额外的优化步骤。我们在多种架构上评估了我们的方法,包括用于图像识别的MLP和卷积架构,以及用于机器翻译的Transformer模型。实验表明,无论是否使用批量归一化,该方法在模型精度和训练速度方面均有显著提升。本方法因其简洁性、理论创新性以及可验证的实证优势,成为神经网络初始化实践中的一个有力增强。这些结果表明,利用涌现来改进神经网络训练方法是一个前景广阔的研究方向。代码发布于:https://github.com/johnnyjingzeli/EmergenceInit。