Frank-Wolfe algorithm for DC optimization problem

In the present paper, we formulate two versions of Frank--Wolfe algorithm or conditional gradient method to solve the DC optimization problem with an adaptive step size. The DC objective function consists of two components; the first is thought to be differentiable with a continuous Lipschitz gradient, while the second is only thought to be convex. The second version is based on the first and employs finite differences to approximate the gradient of the first component of the objective function. In contrast to past formulations that used the curvature/Lipschitz-type constant of the objective function, the step size computed does not require any constant associated with the components. For the first version, we established that the algorithm is well-defined of the algorithm and that every limit point of the generated sequence is a stationary point of the problem. We also introduce the class of weak-star-convex functions and show that, despite the fact that these functions are non-convex in general, the rate of convergence of the first version of the algorithm to minimize these functions is ${\cal O}(1/k)$. The finite difference used to approximate the gradient in the second version of the Frank-Wolfe algorithm is computed with the step-size adaptively updated using two previous iterations. Unlike previous applications of finite difference in the Frank-Wolfe algorithm, which provided approximate gradients with absolute error, the one used here provides us with a relative error, simplifying the algorithm analysis. In this case, we show that all limit points of the generated sequence for the second version of the Frank-Wolfe algorithm are stationary points for the problem under consideration, and we establish that the rate of convergence for the duality gap is ${\cal O}(1/\sqrt{k})$.

翻译：本文提出了两种自适应步长的Frank-Wolfe算法（或条件梯度法）用于求解DC优化问题。DC目标函数由两部分组成：第一部分假设为具有连续Lipschitz梯度的可微函数，第二部分仅假设为凸函数。第二种版本基于第一种版本，利用有限差分逼近目标函数第一部分的梯度。与以往基于目标函数曲率/Lipschitz型常数的公式不同，本文计算的步长无需与各组成部分相关的任何常数。针对第一种版本，我们证明了算法的良定义性，并表明生成序列的每个极限点都是问题的稳定点。我们还引入了弱星凸函数类，并指出尽管这些函数通常是非凸的，但算法最小化此类函数的第一种版本收敛速度为${\cal O}(1/k)$。在第二种Frank-Wolfe算法版本中，用于逼近梯度的有限差分通过自适应更新的步长（利用前两次迭代结果）计算。不同于以往Frank-Wolfe算法中提供绝对误差近似梯度的有限差分应用，本文所用的有限差分提供相对误差，简化了算法分析。在此情况下，我们证明第二种Frank-Wolfe算法生成序列的所有极限点均为所考虑问题的稳定点，并建立对偶间隙的收敛速度为${\cal O}(1/\sqrt{k})$。