Stochastic min-max optimization has gained interest in the machine learning community with the advancements in GANs and adversarial training. Although game optimization is fairly well understood in the deterministic setting, some issues persist in the stochastic regime. Recent work has shown that stochastic gradient descent-ascent methods such as the optimistic gradient are highly sensitive to noise or can fail to converge. Although alternative strategies exist, they can be prohibitively expensive. We introduce Omega, a method with optimistic-like updates that mitigates the impact of noise by incorporating an EMA of historic gradients in its update rule. We also explore a variation of this algorithm that incorporates momentum. Although we do not provide convergence guarantees, our experiments on stochastic games show that Omega outperforms the optimistic gradient method when applied to linear players.
翻译:随机极小-极大优化随着生成对抗网络和对抗训练的发展引起了机器学习界的关注。尽管在确定性设定下博弈优化已得到较好理解,但在随机场景中仍存在一些问题。近期研究表明,乐观梯度等随机梯度上升-下降方法对噪声高度敏感,甚至可能无法收敛。尽管存在替代策略,但其计算成本可能过高。我们提出Omega方法,其采用类似乐观的更新规则,通过引入历史梯度的指数移动平均来减轻噪声影响。我们还探索了引入动量的算法变体。尽管未提供收敛性保证,但在随机博弈中的实验表明,当应用于线性参与者时,Omega的表现优于乐观梯度方法。