Data imputation is a crucial task due to the widespread occurrence of missing data. Many methods adopt a two-step approach: initially crafting a preliminary imputation (the "draft") and then refining it to produce the final missing data imputation result, commonly referred to as "draft-then-refine". In our study, we examine this prevalent strategy through the lens of graph Dirichlet energy. We observe that a basic "draft" imputation tends to decrease the Dirichlet energy. Therefore, a subsequent "refine" step is necessary to restore the overall energy balance. Existing refinement techniques, such as the Graph Convolutional Network (GCN), often result in further energy reduction. To address this, we introduce a new framework, the Graph Laplacian Pyramid Network (GLPN). GLPN incorporates a U-shaped autoencoder and residual networks to capture both global and local details effectively. Through extensive experiments on multiple real-world datasets, GLPN consistently outperforms state-of-the-art methods across three different missing data mechanisms. The code is available at https://github.com/liguanlue/GLPN.
翻译:数据插补因缺失数据普遍存在而成为关键任务。许多方法采用两步策略:首先构建初步插补(即"草案"),随后对其进行优化以生成最终的缺失数据插补结果,这一流程通常被称为"草案-优化"范式。本研究通过图狄利克雷能量的理论框架分析这一主流策略。我们发现基础"草案"插补往往会导致狄利克雷能量降低,因此需要后续"优化"步骤来恢复整体能量平衡。现有优化技术(如图卷积网络GCN)常引发能量的进一步衰减。为解决该问题,我们提出了新型框架——图拉普拉斯金字塔网络(GLPN)。该网络通过U型自编码器与残差网络的协同架构,实现了全局特征与局部细节的高效捕获。在多个真实数据集上的实验表明,GLPN在三种不同缺失数据机制下均持续超越现有最优方法。代码已开源:https://github.com/liguanlue/GLPN。