We consider a variant of matrix completion where entries are revealed in a biased manner, adopting a model akin to that introduced by Ma and Chen. Instead of treating this observation bias as a disadvantage, as is typically the case, the goal is to exploit the shared information between the bias and the outcome of interest to improve predictions. Towards this, we consider a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent or unobserved factors. This leads to a two stage matrix completion algorithm: first, recover (distances between) the latent factors by utilizing matrix completion for the fully observed noisy binary matrix corresponding to the observation pattern; second, utilize the recovered latent factors as features and sparsely observed noisy outcomes as labels to perform non-parametric supervised learning. The finite-sample error rates analysis suggests that, ignoring logarithmic factors, this approach is competitive with the corresponding supervised learning parametric rates. This implies the two-stage method has performance that is comparable to having access to the unobserved latent factors through exploiting the shared information between the bias and outcomes. Through empirical evaluation using a real-world dataset, we find that with this two-stage algorithm, the estimates have 30x smaller mean squared error compared to traditional matrix completion methods, suggesting the utility of the model and the method proposed in this work.
翻译:我们考虑一种矩阵补全的变体问题,其中条目以有偏方式被揭示,采用类似马和Chen提出的模型。与通常将这种观测偏差视为劣势不同,本文旨在利用偏差与目标结果之间的共享信息来改进预测。为此,我们构建了一个自然模型,其中观测模式与目标结果由同一组潜在或未观测因子驱动。这引出了一个两阶段矩阵补全算法:首先,通过利用对应于观测模式的完全观测噪声二元矩阵进行矩阵补全,恢复潜在因子(之间的距离);其次,将恢复的潜在因子作为特征,稀疏观测的噪声结果作为标签,执行非参数监督学习。有限样本误差率分析表明,忽略对数因子时,该方法与相应的监督学习参数速率相当。这意味着通过利用偏差与结果之间的共享信息,两阶段方法的性能可与直接获取未观测潜在因子的情况相媲美。利用真实数据集的实证评估发现,采用该两阶段算法后,估计值的均方误差比传统矩阵补全方法小30倍,这证明了本文提出的模型及方法的实用性。