We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.
翻译:我们提出一种算法,通过使用一个相关且更易计算的函数作为代理,来最小化具有难以计算梯度的目标函数。该算法基于在代理函数上的近似近端点迭代,并结合来自目标函数的相对少量的随机梯度。当目标函数与代理函数之间的差异是$\delta$-光滑时,我们的算法保证以匹配$\delta$-光滑目标函数上的随机梯度下降的速率收敛,这可以带来显著更好的样本效率。该算法在机器学习中具有许多潜在应用,并提供了利用合成数据、物理模拟器、混合公共和私有数据等的原则性方法。