Deep learning has proven to be effective in a wide variety of loss minimization problems. However, many applications of interest, like minimizing projected Bellman error and min-max optimization, cannot be modelled as minimizing a scalar loss function but instead correspond to solving a variational inequality (VI) problem. This difference in setting has caused many practical challenges as naive gradient-based approaches from supervised learning tend to diverge and cycle in the VI case. In this work, we propose a principled surrogate-based approach compatible with deep learning to solve VIs. We show that our surrogate-based approach has three main benefits: (1) under assumptions that are realistic in practice (when hidden monotone structure is present, interpolation, and sufficient optimization of the surrogates), it guarantees convergence, (2) it provides a unifying perspective of existing methods, and (3) is amenable to existing deep learning optimizers like ADAM. Experimentally, we demonstrate our surrogate-based approach is effective in min-max optimization and minimizing projected Bellman error. Furthermore, in the deep reinforcement learning case, we propose a novel variant of TD(0) which is more compute and sample efficient.
翻译:深度学习已被证明在广泛的损失最小化问题中行之有效。然而,许多感兴趣的应用,如最小化投影贝尔曼误差和极小极大优化,无法建模为标量损失函数的最小化,而是对应于求解一个变分不等式问题。这种设定上的差异带来了许多实际挑战,因为从监督学习中直接迁移的朴素基于梯度的方法在VI情形下往往发散或循环。在本工作中,我们提出了一种与深度学习兼容的、基于代理损失函数的原理性方法来求解VI。我们证明了我们的基于代理的方法具有三个主要优势:(1) 在实际中合理的假设下(当存在隐单调结构、满足插值条件且代理函数得到充分优化时),它能保证收敛;(2) 它为现有方法提供了一个统一的视角;(3) 它适用于现有的深度学习优化器,如ADAM。在实验中,我们证明了我们的基于代理的方法在极小极大优化和最小化投影贝尔曼误差中是有效的。此外,在深度强化学习场景中,我们提出了一种新颖的TD(0)变体,其计算和样本效率更高。