Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.
翻译:机器学习正朝着高阶模型方向发展,这类模型需要在海量数据集上进行预训练,而这一过程伴随着显著的计算开销。传统模型虽具备预训练权重,但架构差异阻碍了这些权重的有效迁移与初始化,使其逐渐过时。为解决上述挑战,我们提出了一种新型框架QuadraNet V2,该框架利用二次神经网络构建高效且可持续的高阶学习模型。该方法采用标准神经网络初始化二次神经元的主项,同时利用二次项自适应增强对数据非线性或偏移的学习能力。通过将具备先进建模能力的二次项与预训练主项相融合,显著提升了高阶网络的信息表征能力。实验表明,借助现有的预训练权重,QuadraNet V2相较于从头训练可将所需的GPU训练时长缩减90%至98.4%,兼具高效性与实用性。