The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $\sigma$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established.
翻译:研究了在参考测度为σ有限测度(不一定是概率测度)的假设下,带相对熵正则化的经验风险最小化(ERM-RER)问题。在此假设下,ERM-RER问题得到泛化,从而允许在融入先验知识时具有更大灵活性,并阐述了若干相关性质。在这些性质中,若解存在,则其被证明是唯一的概率测度,且与参考测度相互绝对连续。该解为经验风险最小化问题提供了一种概率近似正确(PAC)保证,无论原问题是否存在解。对于固定数据集,在特定条件下,当模型从ERM-RER问题的解中采样时,经验风险被证明是次高斯随机变量。通过研究期望经验风险对偏离该解而趋向其他概率测度的敏感性,分析了ERM-RER问题解(吉布斯算法)的泛化能力。最后,建立了敏感性、泛化误差与Lautum信息之间有趣的联系。