We develop a novel approach to tackle the common but challenging problem of conformal inference for missing data in machine learning, focusing on Missing at Random (MAR) data. We propose a new procedure Conformal prediction for Missing data under Multiple Robust Learning (CM--MRL) that combines split conformal calibration with a multiple robust empirical-likelihood (EL) reweighting scheme. The method proceeds via a double calibration by reweighting the complete-case scores by EL so that their distribution matches the full calibration distribution implied by MAR, even when some working models are misspecified. We demonstrate the asymptotic behavior of our estimators through empirical process theory and provide reliable coverage for our prediction intervals, both marginally and conditionally and we further show an interval-length dominance result. We show the effectiveness of the proposed method by several numerical experiments in the presence of missing data.
翻译:我们提出了一种新方法来解决机器学习中常见但具有挑战性的缺失数据保形推断问题,重点关注随机缺失数据。我们提出了一种新流程——多重稳健学习下缺失数据的保形预测,该方法将分割保形校准与多重稳健经验似然重加权方案相结合。该流程通过经验似然对完整案例得分进行重加权,使其分布与随机缺失所隐含的完整校准分布相匹配,从而实现双重校准,即使某些工作模型设定错误时也成立。我们通过经验过程理论证明了估计量的渐近性质,并为预测区间提供了可靠的覆盖保证,包括边际覆盖和条件覆盖,并进一步证明了区间长度的优势性。通过多个存在缺失数据的数值实验,我们验证了所提方法的有效性。