This paper addresses the problem of universal learning under model misspecification with log-loss. In this setting, the learner operates with a hypothesis class of models denoted by $Θ$, while the true data-generating process belongs to a broader class $Φ\supset Θ$, and may lie outside the assumed hypothesis space. Classical approaches have characterized the minimax regret and identified optimal universal learners in both the well-specified stochastic and individual deterministic frameworks. The misspecified setting has received comparatively less attention, although several important results have emerged in recent years. Extending these foundations, we analyze the minimax regret in the misspecified setting and derive the corresponding optimal universal learner. We propose this formulation as a unified framework for universal learning, applicable to any form of uncertainty in the data-generating process, across both online and batch data arrival modes, as well as supervised and unsupervised learning tasks.
翻译:本文针对对数损失下模型错定场景中的通用学习问题展开研究。在该设定中,学习器使用由参数集$Θ$表示的假设模型类进行操作,而真实数据生成过程属于更广泛的类别$Φ\supset Θ$,可能超出假设空间范畴。经典方法已刻画了良好设定随机框架与个体确定性框架下的极小化最大遗憾值,并识别出最优通用学习器。尽管近年涌现出若干重要成果,但错定设定受到的关注相对较少。在扩展这些基础理论的前提下,我们分析了错定设定中的极小化最大遗憾值,推导出对应的最优通用学习器。我们提出将该公式作为通用学习的统一框架,可适用于数据生成过程中任意形式的未知性,涵盖在线与批量数据到达模式,以及监督与无监督学习任务。