Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training. Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.
翻译:随机极小化极大优化随着生成对抗网络和对抗训练的进展引起了机器学习界的关注。尽管在确定性设定下博弈优化已相当成熟,但在随机场景中仍存在一些问题。近期研究表明,随机梯度上升-下降方法(如乐观梯度)对噪声高度敏感,甚至可能无法收敛。虽然存在替代策略,但它们的计算成本可能过高。我们提出Omega方法,该方法采用类乐观更新,通过在其更新规则中融入历史梯度的指数移动平均来减轻噪声影响。我们还探索了融入动量的算法变体。尽管未提供收敛保证,但我们在随机博弈中的实验表明,Omega方法应用于线性博弈方时表现优于乐观梯度方法。