In this paper we consider the problem of universal {\em batch} learning in a misspecification setting with log-loss. In this setting the hypothesis class is a set of models $\Theta$. However, the data is generated by an unknown distribution that may not belong to this set but comes from a larger set of models $\Phi \supset \Theta$. Given a training sample, a universal learner is requested to predict a probability distribution for the next outcome and a log-loss is incurred. The universal learner performance is measured by the regret relative to the best hypothesis matching the data, chosen from $\Theta$. Utilizing the minimax theorem and information theoretical tools, we derive the optimal universal learner, a mixture over the set of the data generating distributions, and get a closed form expression for the min-max regret. We show that this regret can be considered as a constrained version of the conditional capacity between the data and its generating distributions set. We present tight bounds for this min-max regret, implying that the complexity of the problem is dominated by the richness of the hypotheses models $\Theta$ and not by the data generating distributions set $\Phi$. We develop an extension to the Arimoto-Blahut algorithm for numerical evaluation of the regret and its capacity achieving prior distribution. We demonstrate our results for the case where the observations come from a $K$-parameters multinomial distributions while the hypothesis class $\Theta$ is only a subset of this family of distributions.
翻译:本文研究了在误设设定下使用对数损失的通用批量学习问题。在此设定中,假设类别为模型集合$\Theta$,但数据由未知分布生成,该分布可能不属于$\Theta$,而是来自更大的模型集合$\Phi \supset \Theta$。给定训练样本,要求通用学习器预测下一结果的概率分布,并产生对数损失。通用学习器的性能通过相对于从$\Theta$中选择的、与数据最匹配的最佳假设的遗憾来衡量。利用极小极大定理和信息论工具,我们推导出最优通用学习器——即数据生成分布集合上的混合分布,并得到了极小极大遗憾的闭式表达式。我们证明该遗憾可视为数据与其生成分布集合之间条件容量的约束版本。我们给出了该极小极大遗憾的紧界,表明问题的复杂度主要由假设模型$\Theta$的丰富性主导,而非数据生成分布集合$\Phi$。我们扩展了Arimoto-Blahut算法,用于数值计算遗憾及其达到容量的先验分布。我们通过观测数据来自$K$参数多项分布而假设类别$\Theta$仅为该分布族子集的情形,展示了研究结果。