This paper addresses the problem of universal learning under model misspecification with log-loss. In this setting, the learner operates with a hypothesis class of models denoted by $Θ$, while the true data-generating process belongs to a broader class $Φ\supset Θ$, and may lie outside the assumed hypothesis space. Classical approaches have characterized the minimax regret and identified optimal universal learners in both the well-specified stochastic and individual deterministic frameworks. The misspecified setting has received comparatively less attention, although several important results have emerged in recent years. Extending these foundations, we analyze the minimax regret in the misspecified setting and derive the corresponding optimal universal learner. We propose this formulation as a unified framework for universal learning, applicable to any form of uncertainty in the data-generating process, across both online and batch data arrival modes, as well as supervised and unsupervised learning tasks.
翻译:摘要:本文研究了在模型错误设定下带对数损失的通用学习问题。在该设定中,学习器使用由\(Θ\)表示的假设模型类别进行运算,而真实数据生成过程属于更广泛的类别\(Φ\supset Θ\),可能位于假定的假设空间之外。经典方法已在良好设定的随机框架和个体确定性框架中刻画了极小化最大遗憾并识别了最优通用学习器。尽管近年来出现了若干重要成果,但错误设定场景受到的关注相对较少。基于这些基础,我们分析了错误设定场景下的极小化最大遗憾,并推导出相应的最优通用学习器。我们提出将此公式作为通用学习的统一框架,适用于数据生成过程中任何形式的不确定性,涵盖在线和批量数据到达模式,以及监督学习和无监督学习任务。