We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $\delta$-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a $\delta$-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.
翻译:我们提出了一种算法,通过使用相关且更易计算的函数作为代理,来最小化具有难以计算梯度的目标函数。该算法基于对代理执行近似近端点迭代,同时结合目标函数中相对较少的随机梯度。当目标函数与代理之间的差值满足$\delta$-光滑性时,我们的算法保证收敛速度与在$\delta$-光滑目标上使用随机梯度下降法相匹配,从而显著提升样本效率。该算法在机器学习领域具有广泛潜在应用,为利用合成数据、物理模拟器、公共与私有数据混合等场景提供了原则性方法。