We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation learning and adversarial training. Our target optimization framework uses the (expensive) gradient computation to construct surrogate functions in a \emph{target space} (e.g. the logits output by a linear model for classification) that can be minimized efficiently. This allows for multiple parameter updates to the model, amortizing the cost of gradient computation. In the full-batch setting, we prove that our surrogate is a global upper-bound on the loss, and can be (locally) minimized using a black-box optimization algorithm. We prove that the resulting majorization-minimization algorithm ensures convergence to a stationary point of the loss. Next, we instantiate our framework in the stochastic setting and propose the $SSO$ algorithm, which can be viewed as projected stochastic gradient descent in the target space. This connection enables us to prove theoretical guarantees for $SSO$ when minimizing convex functions. Our framework allows the use of standard stochastic optimization algorithms to construct surrogates which can be minimized by any deterministic optimization method. To evaluate our framework, we consider a suite of supervised learning and imitation learning problems. Our experiments indicate the benefits of target optimization and the effectiveness of $SSO$.
翻译:我们考虑最小化那些计算(可能具有随机性的)梯度成本高昂的函数。这类函数在强化学习、模仿学习和对抗训练中普遍存在。我们的目标优化框架利用(昂贵的)梯度计算在*目标空间*(例如分类任务中线性模型输出的logits)中构建可高效最小化的代理函数。这使得模型能够进行多次参数更新,从而分摊梯度计算成本。在全批量设置中,我们证明该代理函数是损失函数的全局上界,并可通过黑盒优化算法进行(局部)最小化。我们进一步证明,由此产生的主化-最小化算法能够保证收敛至损失函数的驻点。随后,我们将该框架实例化到随机场景中,并提出$SSO$算法,该算法可视为目标空间中的投影随机梯度下降。这一联系使我们能够证明$SSO$在最小化凸函数时的理论保证。我们的框架允许使用标准随机优化算法构建代理函数,而该函数可通过任意确定性优化方法进行最小化。为评估该框架,我们考虑了一系列监督学习和模仿学习问题。实验结果表明了目标优化的优势及$SSO$的有效性。