Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.
翻译:现代矩阵补全问题常涉及异质数据,其行同时隶属于多个元类别,例如推荐系统中的人口统计与年龄分组,或神经电生理实验中区域与记录会话标签。标准低秩估计量施加单一全局隐几何结构,虽能恢复平均结构,但可能平滑掉子组特异性变异,尤其在观测样本不均分布时。我们提出群组感知矩阵估计(GAME)——一种针对重叠子组低秩矩阵估计的凸估计方法。GAME通过重叠核范数惩罚对类别特定子矩阵进行正则化,使相关群组在共享坐标系统中保留局部隐结构的同时实现信息共享。我们为重构误差与子组特定子空间恢复提供了有限样本保证,揭示了性能如何依赖于采样密度、子组秩及重叠结构。在合成、推荐、生态及神经科学数据集上的实验表明,GAME在结构化缺失机制中表现最优,此时群组感知正则化能同时提升重构精度与隐子空间保真度。在各项基准测试中,GAME与全局低秩、辅助信息及现代插补基线方法相比具有竞争力或最优,当子组呈现迥异的低秩结构时,其性能增益最为显著。