Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $\Theta(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves $O(\log^2(T))$ regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of $O(\frac{\log^3 T}{T})$, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.
翻译:在线梯度下降(OGD)在强凸性或单调性假设下具有双重最优性:(1)在单智能体环境中,对于强凸代价函数,它实现了$\Theta(\log T)$的最优遗憾;(2)在强单调博弈的多智能体环境中,每个智能体采用OGD时,联合行动以$\Theta(\frac{1}{T})$的最优速率收敛到唯一纳什均衡。尽管这些有限时间保证凸显了其优势,但OGD需要已知强凸性/单调性参数。本文设计了一种完全自适应的OGD算法\textsf{AdaOGD},无需预先知晓这些参数。在单智能体环境中,我们的算法在强凸性下实现$O(\log^2(T))$的遗憾,在log因子意义下达到最优。此外,若每个智能体在强单调博弈中使用\textsf{AdaOGD},联合行动在最后迭代意义下以$O(\frac{\log^3 T}{T})$的速率收敛到唯一纳什均衡,同样在log因子意义下最优。我们将算法应用于经典报童问题的学习版本,其中由于销售损失,仅能观察到(含噪声的)梯度反馈。这一结果立即为单零售商和多零售商场景提供了首个可行且近最优的算法。我们还将结果推广至更一般的指数凹代价函数与博弈,采用在线牛顿步(ONS)算法。