In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times), given a set of $n$ experts who make predictions on each day (or time). The algorithm is given feedback on the outcomes of each day, including the cost of its prediction and the cost of the expert predictions, and the goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set. Recent work by Srinivas, Woodruff, Xu, and Zhou (STOC 2022) introduced the study of the online learning with experts problem under memory constraints. However, often the predictions made by experts or algorithms at some time influence future outcomes, so that the input is adaptively chosen. Whereas deterministic algorithms would be robust to adaptive inputs, existing algorithms all crucially use randomization to sample a small number of experts. In this paper, we study deterministic and robust algorithms for the experts problem. We first show a space lower bound of $\widetilde{\Omega}\left(\frac{nM}{RT}\right)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes. Our result shows that the natural deterministic algorithm, which iterates through pools of experts until each expert in the pool has erred, is optimal up to polylogarithmic factors. On the positive side, we give a randomized algorithm that is robust to adaptive inputs that uses $\widetilde{O}\left(\frac{n}{R\sqrt{T}}\right)$ space for $M=O\left(\frac{R^2 T}{\log^2 n}\right)$, thereby showing a smooth space-regret trade-off.
翻译:在在线学习专家问题中,算法需根据T天(或时刻)内一组n个专家每天(或时刻)的预测结果,对每个时刻的结果进行预测。算法会收到每天结果的反馈信息,包括其自身预测的成本以及专家预测的成本,其目标是最小化预测成本,尤其是相对于专家集中最佳专家的成本。Srinivas、Woodruff、Xu和Zhou(STOC 2022)近期工作引入了内存约束下在线学习专家问题的研究。然而,专家或算法在某一时刻的预测往往会影响后续结果,导致输入被自适应选择。尽管确定性算法对自适应输入具有鲁棒性,但现有算法均关键依赖随机化来采样少量专家。本文研究专家问题的确定性与鲁棒性算法。我们首先证明:任何实现遗憾值R的确定性算法,当最佳专家犯错次数为M时,其空间下界为$\widetilde{\Omega}\left(\frac{nM}{RT}\right)$。这一结果表明,自然确定性算法(逐个遍历专家池直至池中每个专家均犯错)在多项式对数因子范围内是最优的。在积极层面,我们提出一种对自适应输入鲁棒的随机化算法,当$M=O\left(\frac{R^2 T}{\log^2 n}\right)$时仅需$\widetilde{O}\left(\frac{n}{R\sqrt{T}}\right)$空间,从而展示了平滑的空间-遗憾权衡关系。