A Modern Introduction to Online Learning

from arxiv, Major update: One new chapter (Online Learning to X); massive tightening of all the math; simplification of the betting algorithm that loses a constant fraction of money; exp-concave functions are now for extended-real-valued function; new layout for publication; added index

In this book, I introduce the basic concepts of Online Learning through the modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are addressed through convex surrogate losses and randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. Finally, I also cover advanced topics, including black-box reductions, saddle-point optimization, sequential investment, and non-stationary forms of regret analysis. The book concludes with a selection of applications of online learning to domains far from it, such as generalization theory and concentration inequalities. I tried to maintain an informal, but mathematically serious, tone throughout the book. No prior knowledge of convex analysis is required. Moreover, all the included proofs have been carefully chosen to be as simple and as short as possible. This also means that sometimes I have added one or two additional assumptions, just to simplify the proofs.

翻译：本书通过现代视角下的在线凸优化，介绍了在线学习的基本概念。在此，在线学习指的是在最坏情况假设下最小化遗憾的框架。我介绍了针对凸损失函数、欧几里得空间与非欧几里得空间中的一阶与二阶在线学习算法。所有算法均清晰地呈现为在线镜像下降法或跟随正则化领导者法及其变体的具体实例。通过自适应和无参数在线学习算法，特别关注了算法参数的调整以及无界域中的学习问题。非凸损失函数通过凸替代损失函数和随机化来处理。简要讨论了赌博机场景，涉及对抗性与随机性多臂赌博机问题。最后，我还涵盖了高级主题，包括黑盒归约、鞍点优化、序列投资以及非平稳形式的遗憾分析。本书以在线学习在泛化理论、浓度不等式等远距离领域的应用实例作为结尾。我力求在全书中保持非正式但数学严谨的风格。读者无需预先了解凸分析知识。此外，所有包含的证明都经过精心挑选，力求尽可能简洁与简短。这意味着有时我会额外添加一两个假设，仅是为了简化证明过程。