Partial Feedback Online Learning

We study partial-feedback online learning, where each instance admits a set of correct labels, but the learner only observes one correct label per round; any prediction within the correct set is counted as correct. This model captures settings such as language generation, where multiple responses may be valid but data provide only a single reference. We give a near-complete characterization of minimax regret for both deterministic and randomized learners in the set-realizable regime, i.e., in the regime where sublinear regret is generally attainable. For deterministic learners, we introduce the Partial-Feedback Littlestone dimension (PFLdim) and show it precisely governs learnability and minimax regret; technically, PFLdim cannot be defined via the standard version space, requiring a new collection version space viewpoint and an auxiliary dimension used only in the proof. We further develop the Partial-Feedback Measure Shattering dimension (PMSdim) to obtain tight bounds for randomized learners. We identify broad conditions ensuring inseparability between deterministic and randomized learnability (e.g., finite Helly number or nested-inclusion label structure), and extend the argument to set-valued online learning, resolving an open question of Raman et al. [2024b]. Finally, we show a sharp separation from weaker realistic and agnostic variants: outside set realizability, the problem can become information-theoretically intractable, with linear regret possible even for $|H|=2$. This highlights the need for fundamentally new, noise-sensitive complexity measures to meaningfully characterize learnability beyond set realizability.

翻译：我们研究部分反馈在线学习，其中每个实例允许存在一组正确标签，但学习者每轮仅观察到一个正确标签；任何在正确集合内的预测均被视为正确。该模型捕捉了诸如语言生成等场景，其中多个响应可能有效但数据仅提供单一参考。我们针对集合可实现的机制（即通常可实现亚线性遗憾的机制）中确定性和随机性学习者的极小极大遗憾给出了近乎完整的刻画。对于确定性学习者，我们引入了部分反馈利特尔斯通维度（PFLdim），并证明其精确地支配了可学习性与极小极大遗憾；从技术角度看，PFLdim无法通过标准版本空间定义，需要采用新的集合版本空间视角以及仅在证明中使用的辅助维度。我们进一步提出了部分反馈测度粉碎维度（PMSdim）以获得随机性学习者的紧致界。我们识别了确保确定性与随机性可学习性不可分离的广泛条件（例如有限海利数或嵌套包含标签结构），并将论证推广至集合值在线学习，解决了Raman等人[2024b]的开放性问题。最后，我们展示了与较弱现实性和不可知性变体的显著分离：在集合可实现性之外，该问题可能在信息论意义上变得不可处理，即使对于$|H|=2$也可能出现线性遗憾。这凸显了需要从根本上构建对噪声敏感的新复杂度度量，以在集合可实现性之外有意义地表征可学习性。