Given only a few observed entries from a low-rank matrix $X$, matrix completion is the problem of imputing the missing entries, and it formalizes a wide range of real-world settings that involve estimating missing data. However, when there are too few observed entries to complete the matrix, what other aspects of the underlying matrix can be reliably recovered? We study one such problem setting, that of "one-sided" matrix completion, where our goal is to recover the right singular vectors of $X$, even in the regime where recovering the left singular vectors is impossible, which arises when there are more rows than columns and very few observations. We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns. We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing. In these settings, our algorithm substantially outperforms standard matrix completion and a variety of direct factorization methods.
翻译:给定低秩矩阵 $X$ 的少量观测条目,矩阵补全旨在填补缺失条目,这一方法涵盖了众多涉及缺失数据估计的实际场景。然而,当观测条目过少而无法完成矩阵补全时,底层矩阵的哪些其他特性能够被可靠恢复?本文研究了"单边"矩阵补全问题:我们的目标是恢复 $X$ 的右奇异向量,即使此时恢复左奇异向量是不可能的——这种情况出现在行数多于列数且观测值极少的场景。我们提出了一种自然算法,该算法通过填补矩阵 $X^TX$ 的缺失值,并证明:即使 $X$ 每行仅有两个观测值,只要我们至少有 $\Omega(r^2 d \log d)$ 行(其中 $r$ 为秩,$d$ 为列数),就能以可证明的方式恢复 $X^TX$。我们在合成数据的单边恢复和低覆盖度基因组测序中对算法进行了评估。在这些场景下,我们的算法显著优于标准矩阵补全方法及一系列直接分解方法。