Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best. Our code is available at https://github.com/G-AILab/IGRM.
翻译:有效的数据插补需要从“朴素”表格数据中挖掘丰富的潜在“结构”。基于图神经网络的数据插补方法通过直接将表格数据转化为二分图,展示了其强大的结构学习潜力。然而,由于样本间缺乏关联,这些方法对所有样本同等对待,这与一个重要观察相悖:“相似样本应提供更多关于缺失值的信息”。本文提出了一种新颖的缺失数据迭代图生成与重构框架(IGRM)。不同于对样本无差别处理,我们引入“朋友网络”概念以表征样本间的不同关联。为在数据缺失情况下生成准确的朋友网络,我们设计了一种端到端的朋友网络重构方案,使得在插补学习过程中能够持续优化朋友网络。优化后的朋友网络表示进而通过差异化消息传递机制进一步优化数据插补过程。在八个基准数据集上的实验结果表明,IGRM相较于九种基线方法实现了39.13%的平均绝对误差降低,较次优方法降低9.04%。我们的代码开源在https://github.com/G-AILab/IGRM。