We study the performance of stochastic first-order methods for finding saddle points of convex-concave functions. A notorious challenge faced by such methods is that the gradients can grow arbitrarily large during optimization, which may result in instability and divergence. In this paper, we propose a simple and effective regularization technique that stabilizes the iterates and yields meaningful performance guarantees even if the domain and the gradient noise scales linearly with the size of the iterates (and is thus potentially unbounded). Besides providing a set of general results, we also apply our algorithm to a specific problem in reinforcement learning, where it leads to performance guarantees for finding near-optimal policies in an average-reward MDP without prior knowledge of the bias span.
翻译:本文研究了随机一阶方法在寻找凸凹函数鞍点时的性能表现。此类方法面临的一个突出挑战是梯度在优化过程中可能无限增大,从而导致不稳定性和发散。我们提出一种简单有效的正则化技术,即使定义域和梯度噪声与迭代规模呈线性关系(因而可能无界),该技术也能稳定迭代过程并提供有意义的性能保证。除给出一般性结果外,我们还将该算法应用于强化学习的具体问题,在无需预先了解偏差跨度的情况下,为平均奖励马尔可夫决策过程中寻找近似最优策略提供了性能保证。