This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and their corresponding missing mask. Inspired by GAN-based approaches, which train generators to decrease the predictability of missingness patterns, our method explicitly targets the reduction of mutual information. Specifically, our algorithm iteratively minimizes the KL divergence between the joint distribution of the imputed data and missing mask, and the product of their marginals from the previous iteration. We show that the optimal imputation under this framework corresponds to solving an ODE, whose velocity field minimizes a rectified flow training objective. We further illustrate that some existing imputation techniques can be interpreted as approximate special cases of our mutual-information-reducing framework. Comprehensive experiments on synthetic and real-world datasets validate the efficacy of our proposed approach, demonstrating superior imputation performance.
翻译:本文提出了一种新颖的迭代式缺失数据填补方法,该方法通过逐步降低数据与其对应缺失掩码之间的互信息来实现。受基于生成对抗网络方法的启发——这类方法通过训练生成器来降低缺失模式的可预测性——我们的方法明确以降低互信息为目标。具体而言,我们的算法迭代地最小化填补后数据与缺失掩码的联合分布,与它们在前一次迭代中的边缘分布乘积之间的KL散度。我们证明,在此框架下的最优填补对应于求解一个常微分方程,其速度场最小化了一个修正流训练目标。我们进一步阐明,一些现有的填补技术可被解释为我们这个互信息降低框架的近似特例。在合成数据集和真实世界数据集上的综合实验验证了我们所提方法的有效性,展示了其优越的填补性能。