We introduce a streaming framework for analyzing stochastic approximation/optimization problems. This streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially. We provide non-asymptotic convergence rates of various gradient-based algorithms; this includes the famous Stochastic Gradient (SG) descent (a.k.a. Robbins-Monro algorithm), mini-batch SG and time-varying mini-batch SG algorithms, as well as their iterated averages (a.k.a. Polyak-Ruppert averaging). We show i) how to accelerate convergence by choosing the learning rate according to the time-varying mini-batches, ii) that Polyak-Ruppert averaging achieves optimal convergence in terms of attaining the Cramer-Rao lower bound, and iii) how time-varying mini-batches together with Polyak-Ruppert averaging can provide variance reduction and accelerate convergence simultaneously, which is advantageous for many learning problems, such as online, sequential, and large-scale learning. We further demonstrate these favorable effects for various time-varying mini-batches.
翻译:我们提出了一种分析随机逼近/优化问题的流式框架。该流式框架类似于利用顺序到达的时变小批量数据求解优化问题。我们给出了多种基于梯度算法的非渐近收敛速率,包括著名的随机梯度下降法(即Robbins-Monro算法)、小批量SG算法和时变小批量SG算法,以及它们的迭代平均(即Polyak-Ruppert平均)。我们展示了:i)如何通过根据时变小批量选择学习率来加速收敛,ii)Polyak-Ruppert平均在达到Cramer-Rao下界方面实现最优收敛,iii)时变小批量结合Polyak-Ruppert平均可同时实现方差缩减和加速收敛,这对许多学习问题(如在线学习、序列学习和大规模学习)具有显著优势。我们进一步针对多种时变小批量场景验证了这些有利效果。