We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD). We propose a fully parameter-free version of AdaGrad, which is adaptive to the distance between the initialization and the optimum, and to the sum of the square norm of the subgradients. Our algorithm is able to handle projection steps, does not involve restarts, reweighing along the trajectory or additional gradient evaluations compared to the classical PGD. It also fulfills optimal rates of convergence for cumulative regret up to logarithmic factors. We provide an extension of our approach to stochastic optimization and conduct numerical experiments supporting the developed theory.
翻译:我们考虑使用投影梯度下降法(PGD)最小化闭凸集上的凸函数问题。我们提出了一种完全无参数的AdaGrad版本,该算法能够自适应地调整初始点与最优解之间的距离以及次梯度平方范数之和。该算法能够处理投影步骤,无需重启、沿轨迹重新加权或比经典PGD更额外梯度评估。它在累积遗憾方面实现了对数因子内的最优收敛速率。我们将该方法推广至随机优化,并通过数值实验验证了所提出的理论。