In online convex optimization, the player aims to minimize regret, or the difference between her loss and that of the best fixed decision in hindsight over the entire repeated game. Algorithms that minimize (standard) regret may converge to a fixed decision, which is undesirable in changing or dynamic environments. This motivates the stronger metrics of performance, notably adaptive and dynamic regret. Adaptive regret is the maximum regret over any continuous sub-interval in time. Dynamic regret is the difference between the total cost and that of the best sequence of decisions in hindsight. State-of-the-art performance in both adaptive and dynamic regret minimization suffers a computational penalty - typically on the order of a multiplicative factor that grows logarithmically in the number of game iterations. In this paper we show how to reduce this computational penalty to be doubly logarithmic in the number of game iterations, and retain near optimal adaptive and dynamic regret bounds.
翻译:在线凸优化中,玩家旨在最小化遗憾,即其损失与整个重复博弈中事后最佳固定决策的损失之差。最小化(标准)遗憾的算法可能收敛至固定决策,这在变化或动态环境中并不理想。这一缺陷推动了更强性能指标的研究,特别是自适应遗憾与动态遗憾。自适应遗憾定义为时间轴上任意连续子区间内的最大遗憾;动态遗憾则为总成本与事后最佳决策序列成本之差。当前自适应遗憾与动态遗憾最小化的最优性能存在计算代价——通常表现为随博弈迭代次数对数增长的多项式因子。本文提出将这一计算代价降低至博弈迭代次数的双对数级别,同时保持近乎最优的自适应与动态遗憾界。