Stochastic optimization algorithms are widely used for machine learning with large-scale data. However, their convergence often suffers from non-vanishing variance. Variance Reduction (VR) methods, such as SVRG and SARAH, address this issue but introduce a bottleneck by requiring periodic full gradient computations. In this paper, we explore popular VR techniques and propose an approach that eliminates the necessity for expensive full gradient calculations. To avoid these computations and make our approach memory-efficient, we employ two key techniques: the shuffling heuristic and the concept of SAG/SAGA methods. For non-convex objectives, our convergence rates match those of standard shuffling methods, while under strong convexity, they demonstrate an improvement. We empirically validate the efficiency of our approach and demonstrate its scalability on large-scale machine learning tasks including image classification problem on CIFAR-10 and CIFAR-100 datasets.
翻译:随机优化算法在大规模数据机器学习中应用广泛,但其收敛过程常受非零方差影响。方差缩减(VR)方法(如SVRG和SARAH)虽能解决此问题,却因需周期性计算完整梯度而引入瓶颈。本文系统研究主流VR技术,提出一种无需昂贵完整梯度计算的新方法。为规避完整梯度计算并实现内存高效性,我们采用两项关键技术:洗牌启发式策略与SAG/SAGA方法的核心思想。对于非凸目标函数,本方法的收敛速率与标准洗牌方法持平;而在强凸条件下,其收敛表现显著提升。我们通过实证验证了方法的效率,并在CIFAR-10和CIFAR-100数据集上的图像分类任务中证明了其大规模机器学习场景下的可扩展性。