Sample-efficient inductive matrix completion with noise and inexact side-information

Inductive matrix completion (IMC) is a variant of low-rank matrix completion that incorporates row and column side-information. In principle, it can reduce the effective dimension of the recovery problem from the ambient matrix size to the dimension of the side-information features. Existing theory, however, does not fully realize this advantage in the noisy setting: sample-efficient guarantees only apply to noiseless recovery, while noisy guarantees require sample sizes comparable to ordinary matrix completion. This paper closes this gap for noisy IMC. We analyze a nonconvex projected gradient descent algorithm with spectral initialization and prove that, under exact side-information, it achieves linear convergence and stable recovery at a sample complexity governed by the effective side-information dimension rather than the ambient matrix dimension. The key technical ingredient is a local regularity condition for the IMC loss that holds at this reduced sample size, despite the mismatch between the observation pattern and the side-information subspaces. We further extend the analysis to inexact side-information, showing that the same reduced sample complexity is preserved and that the estimation error degrades optimally with the level of subspace misspecification. Motivated by this trade-off, we also propose a penalized interpolation between IMC and ordinary matrix completion that balances sample efficiency against robustness to imperfect side-information. Simulations and experiments on the MovieLens dataset support the theoretical findings and illustrate the practical benefits of exploiting side-information in low-sample regimes.

翻译：归纳式矩阵补全（IMC）是低秩矩阵补全的一种变体，它融合了行和列的辅助信息。原则上，它能将恢复问题的有效维度从原始矩阵规模降低至辅助信息特征的维度。然而，现有理论在含噪声场景下并未完全实现这一优势：样本高效的保证仅适用于无噪声恢复，而含噪声的保证所需样本量与普通矩阵补全相当。本文填补了含噪声IMC的这一空白。我们分析了一种采用谱初始化的非凸投影梯度下降算法，并证明在精确辅助信息下，该算法能达到线性收敛速度与稳定恢复，其样本复杂度由有效的辅助信息维度而非原始矩阵维度决定。关键的技术创新在于：尽管观测模式与辅助信息子空间存在不匹配，我们仍在此降低的样本量下建立了IMC损失函数的局部正则性条件。我们进一步将分析扩展到不精确辅助信息场景，证明了相同的降低样本复杂度得以保持，且估计误差随子空间误配程度最优化地退化。受此权衡启发，我们还提出一种在IMC与普通矩阵补全之间进行惩罚插值的方法，以平衡对不完美辅助信息的样本效率与鲁棒性。在MovieLens数据集上的仿真与实验支持了理论发现，并展示了在低样本场景下利用辅助信息的实际优势。