In this paper, we study the low-rank matrix completion problem, a class of machine learning problems, that aims at the prediction of missing entries in a partially observed matrix. Such problems appear in several challenging applications such as collaborative filtering, image processing, and genotype imputation. We compare the Bayesian approaches and a recently introduced de-biased estimator which provides a useful way to build confidence intervals of interest. From a theoretical viewpoint, the de-biased estimator comes with a sharp minimax-optimal rate of estimation error whereas the Bayesian approach reaches this rate with an additional logarithmic factor. Our simulation studies show originally interesting results that the de-biased estimator is just as good as the Bayesian estimators. Moreover, Bayesian approaches are much more stable and can outperform the de-biased estimator in the case of small samples. In addition, we also find that the empirical coverage rate of the confidence intervals obtained by the de-biased estimator for an entry is absolutely lower than of the considered credible interval. These results suggest further theoretical studies on the estimation error and the concentration of Bayesian methods as they are quite limited up to present.
翻译:本文研究了低秩矩阵补全问题,这是一类旨在预测部分观测矩阵中缺失条目的机器学习问题。此类问题出现在协同过滤、图像处理和基因型插补等多个具有挑战性的应用中。我们比较了贝叶斯方法与最近提出的一种去偏估计量,后者为构建感兴趣的置信区间提供了有效途径。从理论角度看,去偏估计量具有极小化最优的估计误差速率,而贝叶斯方法达到该速率时需额外乘以对数因子。我们的模拟研究揭示了有趣的结果:去偏估计量的表现与贝叶斯估计量相当。此外,贝叶斯方法更为稳定,且在小样本情形下能够超越去偏估计量。同时,我们还发现去偏估计量所得单个条目置信区间的经验覆盖概率显著低于所考虑的贝叶斯可信区间。这些结果表明,由于目前对贝叶斯方法的估计误差与集中性问题的理论研究相当有限,亟需进一步深入探讨。