Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.
翻译:机器学习正朝着高阶模型发展,这类模型需要在大型数据集上进行预训练,而这一过程伴随着巨大的开销。传统模型尽管拥有预训练权重,但由于架构差异阻碍了这些权重的有效迁移和初始化,正逐渐过时。为应对这些挑战,我们提出了一种新型框架QuadraNet V2,该框架利用二次神经网络构建高效且可持续的高阶学习模型。我们的方法使用标准神经网络初始化二次神经元的主项,同时利用二次项自适应地增强数据非线性或偏移的学习。这种将具有高级建模能力的二次项与预训练主项相结合的方式,显著提升了高阶网络的信息表征能力。通过利用现有预训练权重,QuadraNet V2相比从头训练可将所需GPU训练时长降低90%至98.4%,同时展现了高效性与优越性。