Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret. To this end, we formulate the problem of adjusting FTRL's learning rate as a sequential decision-making problem and introduce the framework of competitive analysis. We establish a lower bound for the competitive ratio and propose update rules for learning rate that achieves an upper bound within a constant factor of this lower bound. Specifically, we illustrate that the optimal competitive ratio is characterized by the (approximate) monotonicity of components of the penalty term, showing that a constant competitive ratio is achievable if the components of the penalty term form a monotonically non-increasing sequence, and derive a tight competitive ratio when penalty terms are $\xi$-approximately monotone non-increasing. Our proposed update rule, referred to as \textit{stability-penalty matching}, also facilitates constructing the Best-Of-Both-Worlds (BOBW) algorithms for stochastic and adversarial environments. In these environments our result contributes to achieve tighter regret bound and broaden the applicability of algorithms for various settings such as multi-armed bandits, graph bandits, linear bandits, and contextual bandits.
翻译:跟随正则化领导者(FTRL)被认为是在线学习中一种有效且通用的方法,其中适当选择学习率对于减小遗憾至关重要。为此,我们将调整FTRL学习率的问题表述为序贯决策问题,并引入竞争分析框架。我们建立了竞争比的下界,并提出了学习率更新规则,使得该规则在常数因子范围内达到该下界对应的上界。具体而言,我们证明最优竞争比由惩罚项组分的(近似)单调性决定,表明若惩罚项组分构成单调非递增序列,则可实现常数竞争比,并在惩罚项为$\xi$-近似单调非递增时推导出紧致竞争比。我们提出的更新规则称为“稳定性-惩罚匹配”,还促进了针对随机与对抗环境的“两全其美”(BOBW)算法的构建。在这些环境中,我们的结果有助于实现更紧的遗憾界,并扩展算法在多种设置(如多臂赌博机、图赌博机、线性赌博机及上下文赌博机)中的适用性。