Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of $\Theta(\log T)$ for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of $\Theta(\frac{1}{T})$. While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves $O(\log^2(T))$ regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of $O(\frac{\log^3 T}{T})$, again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.
翻译:在线梯度下降(OGD)在强凸性或单调性假设下已知具有双最优性:(1)在单智能体场景中,对于强凸代价函数,它实现了$\Theta(\log T)$的最优遗憾;(2)在强单调博弈的多智能体场景中,每个智能体采用OGD时,联合行动以最优速率$\Theta(\frac{1}{T})$收敛至唯一纳什均衡的末轮迭代。尽管这些有限时间保证凸显其优势,OGD存在需已知强凸性/单调性参数的缺陷。本文设计了一种完全自适应的OGD算法\textsf{AdaOGD},无需先验知晓这些参数。在单智能体场景中,我们的算法在强凸性下实现了$O(\log^2(T))$的遗憾,这在对数因子意义下达到最优。进一步,若每个智能体在强单调博弈中采用\textsf{AdaOGD},联合行动以$O(\frac{\log^3 T}{T})$的速率收敛至唯一纳什均衡(末轮迭代意义下),同样在对数因子意义下达到最优。我们将算法应用于经典报童问题的学习版本——由于缺货损失仅能观测到(含噪)梯度反馈。我们的结果直接为单零售商和多零售商场景提供了首个可行且近似最优的算法。我们还将结果拓展至更一般的指数凹代价函数与博弈场景,采用在线牛顿步(ONS)算法实现。