Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $\Theta(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves $O(\log^2(T))$ regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of $O(\frac{\log^3 T}{T})$, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.
翻译:在线梯度下降(OGD)在强凸或单调性假设下已证具有双重最优性:(1)在单智能体场景中,对于强凸代价函数可实现$\Theta(\log T)$的最优遗憾;(2)在强单调博弈的多智能体场景中,当每个智能体采用OGD时,联合行动以最优速率$\Theta(\frac{1}{T})$最终迭代收敛至唯一纳什均衡。尽管这些有限时间保证凸显了OGD的优势,但其需要预先知晓强凸/单调性参数的缺陷不容忽视。本文设计了完全自适应的OGD算法\textsf{AdaOGD},无需先验参数知识。在单智能体场景中,该算法在强凸条件下实现了$O(\log^2(T))$的遗憾,仅与最优值相差对数因子。进一步,若每个智能体在强单调博弈中采用\textsf{AdaOGD},联合行动将以$O(\frac{\log^3 T}{T})$的速率最终迭代收敛至唯一纳什均衡,同样仅与最优值相差对数因子。我们通过经典报童问题的学习变体验证了算法有效性——由于缺货现象,仅能观测到(带噪声的)梯度反馈。该结果首次为单零售商和多零售商场景提供了可行且接近最优的算法。我们还将结论拓展至更一般的指数凹代价函数与博弈场景,采用在线牛顿步(ONS)算法。