We consider a variant of matrix completion where entries are revealed in a biased manner, adopting a model akin to that introduced by Ma and Chen. Instead of treating this observation bias as a disadvantage, as is typically the case, our goal is to exploit the shared information between the bias and the outcome of interest to improve predictions. Towards this, we propose a simple two-stage algorithm: (i) interpreting the observation pattern as a fully observed noisy matrix, we apply traditional matrix completion methods to the observation pattern to estimate the distances between the latent factors; (ii) we apply supervised learning on the recovered features to impute missing observations. We establish finite-sample error rates that are competitive with the corresponding supervised learning parametric rates, suggesting that our learning performance is comparable to having access to the unobserved covariates. Empirical evaluation using a real-world dataset reflects similar performance gains, with our algorithm's estimates having 30x smaller mean squared error compared to traditional matrix completion methods.
翻译:我们考虑矩阵补全的一种变体,其中条目以有偏方式揭示,采用类似于马和陈引入的模型。与通常做法中将此观测偏差视为劣势不同,我们的目标是利用偏差与感兴趣结果之间的共享信息来改进预测。为此,我们提出了一种简单的两阶段算法:(i) 将观测模式解释为完全观测的噪声矩阵,对观测模式应用传统矩阵补全方法以估计潜在因子之间的距离;(ii) 在恢复的特征上应用监督学习来填补缺失观测。我们建立了与相应监督学习参数速率相竞争的有有限样本误差率,表明我们的学习性能相当于能够访问未观测协变量。使用真实数据集的实证评估反映了类似的性能提升,我们的算法估计值的均方误差比传统矩阵补全方法小30倍。