This paper investigates statistical inference for noisy matrix completion in a semi-supervised model when auxiliary covariates are available. The model consists of two parts. One part is a low-rank matrix induced by unobserved latent factors; the other part models the effects of the observed covariates through a coefficient matrix which is composed of high-dimensional column vectors. We model the observational pattern of the responses through a logistic regression of the covariates, and allow its probability to go to zero as the sample size increases. We apply an iterative least squares (LS) estimation approach in our considered context. The iterative LS methods in general enjoy a low computational cost, but deriving the statistical properties of the resulting estimators is a challenging task. We show that our method only needs a few iterations, and the resulting entry-wise estimators of the low-rank matrix and the coefficient matrix are guaranteed to have asymptotic normal distributions. As a result, individual inference can be conducted for each entry of the unknown matrices. We also propose a simultaneous testing procedure with multiplier bootstrap for the high-dimensional coefficient matrix. This simultaneous inferential tool can help us further investigate the effects of covariates for the prediction of missing entries.
翻译:本文研究在半监督模型中,当辅助协变量可用时含噪矩阵补全的统计推断问题。该模型由两部分组成:一部分是由未观测潜变量诱导的低秩矩阵;另一部分通过由高维列向量构成的系数矩阵对观测协变量的效应进行建模。我们通过协变量的逻辑回归对响应的观测模式进行建模,并允许其概率随样本量增加而趋近于零。我们在所研究的背景下应用迭代最小二乘估计方法。迭代最小二乘法通常具有较低的计算成本,但推导所得估计量的统计性质是一项具有挑战性的任务。我们证明该方法仅需少量迭代即可收敛,且低秩矩阵与系数矩阵的逐元素估计量确保具有渐近正态分布。因此,可对未知矩阵的每个元素进行个体推断。我们还针对高维系数矩阵提出了一种基于乘子自助法的同时检验程序,该同时推断工具可进一步帮助研究协变量对缺失条目预测的效应。