In this paper, the worst-case probability measure over the data is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. More specifically, the worst-case probability measure is a Gibbs probability measure and the unique solution to the maximization of the expected loss under a relative entropy constraint with respect to a reference probability measure. Fundamental generalization metrics, such as the sensitivity of the expected loss, the sensitivity of the empirical risk, and the generalization gap are shown to have closed-form expressions involving the worst-case data-generating probability measure. Existing results for the Gibbs algorithm, such as characterizing the generalization gap as a sum of mutual information and lautum information, up to a constant factor, are recovered. A novel parallel is established between the worst-case data-generating probability measure and the Gibbs algorithm. Specifically, the Gibbs probability measure is identified as a fundamental commonality of the model space and the data space for machine learning algorithms.
翻译:本文引入数据上的最坏情况概率测度作为刻画机器学习算法泛化能力的工具。具体而言,该最坏情况概率测度是一个吉布斯概率测度,且是在相对熵约束下(相对于某个参考概率测度)最大化期望损失的唯一解。本文证明了期望损失灵敏度、经验风险灵敏度以及泛化差距等基本泛化指标具有包含最坏情况数据生成概率测度的闭式表达式。现有关于吉布斯算法的结果(例如将泛化差距表征为互信息与互拉姆信息的和,仅相差常数因子)被重新推导得出。本文还在最坏情况数据生成概率测度与吉布斯算法之间建立了新的平行关系——具体而言,吉布斯概率测度被识别为机器学习算法模型空间与数据空间的根本共性。