Stochastic Approximation (SA) is a classical algorithm that has had since the early days a huge impact on signal processing, and nowadays on machine learning, due to the necessity to deal with a large amount of data observed with uncertainties. An exemplar special case of SA pertains to the popular stochastic (sub)gradient algorithm which is the working horse behind many important applications. A lesser-known fact is that the SA scheme also extends to non-stochastic-gradient algorithms such as compressed stochastic gradient, stochastic expectation-maximization, and a number of reinforcement learning algorithms. The aim of this article is to overview and introduce the non-stochastic-gradient perspectives of SA to the signal processing and machine learning audiences through presenting a design guideline of SA algorithms backed by theories. Our central theme is to propose a general framework that unifies existing theories of SA, including its non-asymptotic and asymptotic convergence results, and demonstrate their applications on popular non-stochastic-gradient algorithms. We build our analysis framework based on classes of Lyapunov functions that satisfy a variety of mild conditions. We draw connections between non-stochastic-gradient algorithms and scenarios when the Lyapunov function is smooth, convex, or strongly convex. Using the said framework, we illustrate the convergence properties of the non-stochastic-gradient algorithms using concrete examples. Extensions to the emerging variance reduction techniques for improved sample complexity will also be discussed.
翻译:随机逼近(SA)是一种经典算法,自诞生之初就对信号处理产生了深远影响,如今因需处理大量含噪声观测数据,其对机器学习的影响同样重大。SA的一个典型特例是流行的随机(次)梯度算法,该算法是许多重要应用的核心支柱。鲜为人知的是,SA框架同样适用于非随机梯度的算法,例如压缩随机梯度、随机期望最大化以及诸多强化学习算法。本文旨在通过介绍基于理论支撑的SA算法设计准则,向信号处理与机器学习领域的读者概述并引入SA的非随机梯度视角。我们以提出一个统一现有SA理论(包括其非渐近与渐近收敛性结果)的通用框架为核心主题,并展示其在主流非随机梯度算法中的应用。我们基于满足多种弱条件的李雅普诺夫函数类构建分析框架,揭示了非随机梯度算法与李雅普诺夫函数光滑、凸或强凸情形之间的内在联系。借助该框架,我们通过具体实例阐明了非随机梯度算法的收敛性质,并将探讨新兴方差缩减技术在提升样本复杂度方面的扩展应用。