First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately parallelized iterations (OptEx), the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. OptEx employs kernelized gradient estimation to make use of gradient history for future gradient prediction, enabling parallelization of iterations -- a strategy once considered impractical because of the inherent iterative dependency in FOO. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $\Omega(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx.
翻译:一阶优化算法在机器学习和信号去噪等众多计算领域中至关重要。然而,在神经网络训练等复杂任务中应用此类算法时,由于需要大量顺序迭代才能收敛,往往导致显著的效率低下问题。为此,我们提出了通过近似并行迭代加速一阶优化(OptEx),这是首个利用并行计算缓解一阶优化迭代瓶颈、从而提升其效率的框架。OptEx采用核化梯度估计,利用历史梯度进行未来梯度预测,实现了迭代并行化——这一策略因一阶优化固有的迭代依赖性曾被认为不可行。我们为核化梯度估计的可靠性以及基于SGD的OptEx的迭代复杂度提供了理论保证,证实了随着历史梯度积累,估计误差会趋近于零,并且在并行度为N的情况下,基于SGD的OptEx相较于标准SGD享有Ω(√N)的有效加速比。我们还通过广泛的实证研究,包括合成函数、强化学习任务以及跨多种数据集的神经网络训练,充分证明了OptEx所带来的显著效率提升。