Modern complex datasets often consist of various sub-populations. To develop robust and generalizable methods in the presence of sub-population heterogeneity, it is important to guarantee a uniform learning performance instead of an average one. In many applications, prior information is often available on which sub-population or group the data points belong to. Given the observed groups of data, we develop a min-max-regret (MMR) learning framework for general supervised learning, which targets to minimize the worst-group regret. Motivated from the regret-based decision theoretic framework, the proposed MMR is distinguished from the value-based or risk-based robust learning methods in the existing literature. The regret criterion features several robustness and invariance properties simultaneously. In terms of generalizability, we develop the theoretical guarantee for the worst-case regret over a super-population of the meta data, which incorporates the observed sub-populations, their mixtures, as well as other unseen sub-populations that could be approximated by the observed ones. We demonstrate the effectiveness of our method through extensive simulation studies and an application to kidney transplantation data from hundreds of transplant centers.
翻译:现代复杂数据集通常由多个亚群体组成。为在子群体异质性存在下开发稳健且可泛化的方法,需确保均匀的学习性能而非平均性能。在许多应用中,数据点所属亚群体或组别的先验信息常常可获取。基于观测到的数据分组,我们针对一般监督学习提出了一种最小最大化遗憾(MMR)学习框架,旨在最小化最差组的遗憾。受基于遗憾的决策理论框架启发,所提出的MMR区别于现有文献中基于价值或风险的学习方法。该遗憾准则同时具备多种稳健性与不变性特征。在泛化性方面,我们针对元数据超总体最坏情况下的遗憾建立了理论保证,该超总体包含观测到的子群体、其混合形式,以及其他可通过观测群体近似得到的未知子群体。我们通过大量仿真研究及一项针对数百个移植中心的肾脏移植数据应用,验证了该方法的有效性。