In this work, we address unconstrained finite-sum optimization problems, with particular focus on instances originating in large scale deep learning scenarios. Our main interest lies in the exploration of the relationship between recent line search approaches for stochastic optimization in the overparametrized regime and momentum directions. First, we point out that combining these two elements with computational benefits is not straightforward. To this aim, we propose a solution based on mini-batch persistency. We then introduce an algorithmic framework that exploits a mix of data persistency, conjugate-gradient type rules for the definition of the momentum parameter and stochastic line searches. The resulting algorithm provably possesses convergence properties under suitable assumptions and is empirically shown to outperform other popular methods from the literature, obtaining state-of-the-art results in both convex and nonconvex large scale training problems.
翻译:本文研究无约束有限和优化问题,特别关注源于大规模深度学习场景的实例。我们的主要兴趣在于探索过参数化机制中随机优化的最新线搜索方法与动量方向之间的关系。首先,我们指出将这两个具有计算优势的要素结合并非易事。为此,我们提出一种基于小批量持久性的解决方案。随后,我们引入一个算法框架,该框架综合利用数据持久性、共轭梯度型规则定义动量参数以及随机线搜索。在适当假设下,所提出的算法被证明具有收敛性,并通过实证研究显示其性能优于文献中其他主流方法,在凸与非凸大规模训练问题上均取得了最先进的结果。