In the experts problem, on each of $T$ days, an agent needs to follow the advice of one of $n$ ``experts''. After each day, the loss associated with each expert's advice is revealed. A fundamental result in learning theory says that the agent can achieve vanishing regret, i.e. their cumulative loss is within $o(T)$ of the cumulative loss of the best-in-hindsight expert. Can the agent perform well without sufficient space to remember all the experts? We extend a nascent line of research on this question in two directions: $\bullet$ We give a new algorithm against the oblivious adversary, improving over the memory-regret tradeoff obtained by [PZ23], and nearly matching the lower bound of [SWXZ22]. $\bullet$ We also consider an adaptive adversary who can observe past experts chosen by the agent. In this setting we give both a new algorithm and a novel lower bound, proving that roughly $\sqrt{n}$ memory is both necessary and sufficient for obtaining $o(T)$ regret.
翻译:在专家问题中,在$T$天中的每一天,一个智能体需要遵循$n$位“专家”中一位的建议。每天结束后,每位专家建议对应的损失会被揭示。学习理论中的一个基本结果表明,智能体可以实现可忽略的遗憾,即其累计损失与事后最优专家的累计损失之差为$o(T)$。如果智能体没有足够空间记住所有专家,它还能表现良好吗?我们将这一新兴研究方向朝两个方向进行拓展:$\bullet$ 针对不知情对手,我们提出了一种新算法,改进了[PZ23]中获得的记忆-遗憾权衡,并几乎匹配了[SWXZ22]的下界。$\bullet$ 我们还考虑了能观察智能体过去所选专家行为的自适应对手。在此设定下,我们同时给出了新算法和新颖的下界,证明大约$\sqrt{n}$的记忆容量既是实现$o(T)$遗憾的必要条件也是充分条件。