Motivated by the increasing demand for multi-source data integration in various scientific fields, in this paper we study matrix completion in scenarios where the data exhibits certain block-wise missing structures -- specifically, where only a few noisy submatrices representing (overlapping) parts of the full matrix are available. We propose the Chain-linked Multiple Matrix Integration (CMMI) procedure to efficiently combine the information that can be extracted from these individual noisy submatrices. CMMI begins by deriving entity embeddings for each observed submatrix, then aligns these embeddings using overlapping entities between pairs of submatrices, and finally aggregates them to reconstruct the entire matrix of interest. We establish, under mild regularity conditions, entrywise error bounds and normal approximations for the CMMI estimates. Simulation studies and real data applications show that CMMI is computationally efficient and effective in recovering the full matrix, even when overlaps between the observed submatrices are minimal.
翻译:受各科学领域对多源数据整合需求日益增长的驱动,本文研究数据呈现特定块状缺失结构场景下的矩阵补全问题——具体而言,即仅能获得代表完整矩阵(重叠)部分的少数噪声子矩阵的情况。我们提出链式多矩阵整合(CMMI)方法,以有效整合从这些独立噪声子矩阵中可提取的信息。CMMI首先为每个观测子矩阵推导实体嵌入,随后利用子矩阵对之间的重叠实体对齐这些嵌入,最终通过聚合重构目标完整矩阵。我们在温和的正则性条件下建立了CMMI估计量的逐项误差界与正态逼近理论。仿真研究与实际数据应用表明,即使在观测子矩阵间重叠度极低的情况下,CMMI仍能高效计算并有效恢复完整矩阵。