We study the problem of imputing missing values in a dataset, which has important applications in many domains. The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally. To learn the transformations and impute the missing values simultaneously, a simple and well-motivated algorithm is proposed. Our algorithm has fewer hyperparameters to fine-tune and generates high-quality imputations regardless of how missing values are generated. Extensive experiments over a large number of datasets and competing benchmark algorithms show that our method achieves state-of-the-art performance.
翻译:我们研究数据集中缺失值插补问题,该问题在许多领域具有重要应用。缺失值插补的关键在于利用不完整样本捕获数据分布,并据此完成缺失值的填充。本文利用任意两批含缺失值的数据均源于同一数据分布这一事实,提出通过深度可逆函数将这两批样本变换至潜在空间并进行分布匹配,从而实现缺失值插补。为实现变换学习与缺失值插补的同步进行,我们提出一种简洁且动机明确的算法。该算法超参数调优需求少,且无论缺失值的生成机制如何,均能生成高质量的插补结果。在大量数据集与基准算法的广泛实验中,我们的方法实现了最先进的性能。