The application of machine learning models can be significantly impeded by the occurrence of distributional shifts, as the assumption of homogeneity between the population of training and testing samples in machine learning and statistics may not be feasible in practical situations. One way to tackle this problem is to use invariant learning, such as invariant risk minimization (IRM), to acquire an invariant representation that aids in generalization with distributional shifts. This paper develops methods for obtaining distribution-free prediction regions to describe uncertainty estimates for invariant representations, accounting for the distribution shifts of data from different environments. Our approach involves a weighted conformity score that adapts to the specific environment in which the test sample is situated. We construct an adaptive conformal interval using the weighted conformity score and prove its conditional average under certain conditions. To demonstrate the effectiveness of our approach, we conduct several numerical experiments, including simulation studies and a practical example using real-world data.
翻译:机器学习模型的应用可能因分布偏移而受到显著阻碍,因为机器学习和统计学中训练样本与测试样本总体同质性的假设在实际场景中往往难以成立。解决该问题的一种途径是采用不变学习(如不变风险最小化,IRM)来获取有助于应对分布偏移泛化的不变表征。本文提出无需分布假设的预测区域构建方法,用于描述不变表征的不确定性估计,同时考虑不同环境下数据的分布偏移。我们的方法采用一种加权一致性分数,该分数能自适应适配测试样本所处的特定环境。通过该加权一致性分数构建自适应共形区间,并在特定条件下证明其条件平均覆盖性质。为验证方法的有效性,我们进行了多项数值实验,包括模拟研究及基于真实数据的实际案例。