Optimal Prediction Using Expert Advice and Randomized Littlestone Dimension

A classical result in online learning characterizes the optimal mistake bound achievable by deterministic learners using the Littlestone dimension (Littlestone '88). We prove an analogous result for randomized learners: we show that the optimal expected mistake bound in learning a class $\mathcal{H}$ equals its randomized Littlestone dimension, which is the largest $d$ for which there exists a tree shattered by $\mathcal{H}$ whose average depth is $2d$. We further study optimal mistake bounds in the agnostic case, as a function of the number of mistakes made by the best function in $\mathcal{H}$, denoted by $k$. We show that the optimal randomized mistake bound for learning a class with Littlestone dimension $d$ is $k + \Theta (\sqrt{k d} + d )$. This also implies an optimal deterministic mistake bound of $2k + O (\sqrt{k d} + d )$, thus resolving an open question which was studied by Auer and Long ['99]. As an application of our theory, we revisit the classical problem of prediction using expert advice: about 30 years ago Cesa-Bianchi, Freund, Haussler, Helmbold, Schapire and Warmuth studied prediction using expert advice, provided that the best among the $n$ experts makes at most $k$ mistakes, and asked what are the optimal mistake bounds. Cesa-Bianchi, Freund, Helmbold, and Warmuth ['93, '96] provided a nearly optimal bound for deterministic learners, and left the randomized case as an open problem. We resolve this question by providing an optimal learning rule in the randomized case, and showing that its expected mistake bound equals half of the deterministic bound, up to negligible additive terms. This improves upon previous works by Cesa-Bianchi, Freund, Haussler, Helmbold, Schapire and Warmuth ['93, '97], by Abernethy, Langford, and Warmuth ['06], and by Br\^anzei and Peres ['19], which handled the regimes $k \ll \log n$ or $k \gg \log n$.

翻译：在线学习中的一个经典结论刻画了确定性学习器利用Littlestone维数所能达到的最优错误界（Littlestone '88）。我们证明了随机学习器下的类似结果：在学习一个类别$\mathcal{H}$时，其最优期望错误界等于该类的随机Littlestone维数，即满足$\mathcal{H}可打碎且平均深度为$2d$的树的最大$d$值。我们进一步研究了在非知情（agnostic）情形下的最优错误界，该界依赖于$\mathcal{H}$中最佳函数的误次数$k$。我们证明，若一个类别的Littlestone维数为$d$，其随机最优错误界为$k + \Theta (\sqrt{k d} + d )$。由此也推导出确定性最优错误界为$2k + O (\sqrt{k d} + d )$，从而解决了Auer与Long（'99）研究的未解问题。作为理论的应用，我们重访了基于专家建议的经典预测问题：约30年前，Cesa-Bianchi、Freund、Haussler、Helmbold、Schapire与Warmuth研究了在$n$个专家中最佳者至多犯$k$次错误时的预测问题，并询问了最优错误界。Cesa-Bianchi、Freund、Helmbold与Warmuth（'93, '96）给出了确定性学习器的近乎最优界，但将随机情形留作未解问题。我们通过提供随机情形下的最优学习规则解决了该问题，并证明其期望错误界等于确定性界的一半（至多相差可忽略的加性项）。该结果改进了先前Cesa-Bianchi、Freund、Haussler、Helmbold、Schapire与Warmuth（'93, '97）、Abernethy、Langford与Warmuth（'06）以及Brânzei与Peres（'19）在$k \ll \log n$或$k \gg \log n$区域的工作。