We consider the problem of minimizing the sum of two convex functions. One of those functions has Lipschitz-continuous gradients, and can be accessed via stochastic oracles, whereas the other is "simple". We provide a Bregman-type algorithm with accelerated convergence in function values to a ball containing the minimum. The radius of this ball depends on problem-dependent constants, including the variance of the stochastic oracle. We further show that this algorithmic setup naturally leads to a variant of Frank-Wolfe achieving acceleration under parallelization. More precisely, when minimizing a smooth convex function on a bounded domain, we show that one can achieve an $\epsilon$ primal-dual gap (in expectation) in $\tilde{O}(1/ \sqrt{\epsilon})$ iterations, by only accessing gradients of the original function and a linear maximization oracle with $O(1/\sqrt{\epsilon})$ computing units in parallel. We illustrate this fast convergence on synthetic numerical experiments.
翻译:我们考虑最小化两个凸函数之和的问题。其中一个函数具有Lipschitz连续梯度,且可通过随机预言机访问,而另一个函数是"简单"的。我们提出了一种Bregman型算法,其在函数值上能以加速收敛速度收敛至包含最小值的一个球内。该球的半径取决于问题相关的常数,包括随机预言机的方差。我们进一步证明,该算法框架自然导出了Frank-Wolfe算法的一种变体,可在并行化条件下实现加速收敛。更具体地说,当在有界域上最小化一个光滑凸函数时,我们证明仅需访问原函数的梯度和一个具有$O(1/\sqrt{\epsilon})$个并行计算单元的线性最大化预言机,即可在$\tilde{O}(1/ \sqrt{\epsilon})$次迭代内(以期望意义)达到$\epsilon$量级的原始-对偶间隙。我们在合成数值实验中验证了这种快速收敛性。