We propose an alternative approach to neural network training using the monotone vector field, an idea inspired by the seminal work of Juditsky and Nemirovski [Juditsky & Nemirovsky, 2019] developed originally to solve parameter estimation problems for generalized linear models (GLM) by reducing the original non-convex problem to a convex problem of solving a monotone variational inequality (VI). Our approach leads to computationally efficient procedures that converge fast and offer guarantee in some special cases, such as training a single-layer neural network or fine-tuning the last layer of the pre-trained model. Our approach can be used for more efficient fine-tuning of a pre-trained model while freezing the bottom layers, an essential step for deploying many machine learning models such as large language models (LLM). We demonstrate its applicability in training fully-connected (FC) neural networks, graph neural networks (GNN), and convolutional neural networks (CNN) and show the competitive or better performance of our approach compared to stochastic gradient descent methods on both synthetic and real network data prediction tasks regarding various performance metrics.
翻译:我们提出了一种神经网络训练的替代方法,其核心思想源于Juditsky和Nemirovski[Juditsky & Nemirovsky, 2019]的开创性工作,该方法最初通过将原始非凸问题简化为求解单调变分不等式的凸问题,用于解决广义线性模型的参数估计问题。我们方法采用了单调向量场,从而得到计算高效且收敛快速的算法,并在某些特殊情况下(如训练单层神经网络或微调预训练模型的最后一层)提供理论保证。该方法能够更高效地微调预训练模型(同时冻结底层参数),这是部署大型语言模型等机器学习模型的关键步骤。我们通过在全连接神经网络、图神经网络和卷积神经网络上验证其适用性,并在合成数据与真实网络数据预测任务中,采用多种性能指标证明该方法相比随机梯度下降法具有竞争性或更优的表现。