Many important tasks of large-scale recommender systems can be naturally cast as testing multiple linear forms for noisy matrix completion. These problems, however, present unique challenges because of the subtle bias-and-variance tradeoff of and an intricate dependence among the estimated entries induced by the low-rank structure. In this paper, we develop a general approach to overcome these difficulties by introducing new statistics for individual tests with sharp asymptotics both marginally and jointly, and utilizing them to control the false discovery rate (FDR) via a data splitting and symmetric aggregation scheme. We show that valid FDR control can be achieved with guaranteed power under nearly optimal sample size requirements using the proposed methodology. Extensive numerical simulations and real data examples are also presented to further illustrate its practical merits.
翻译:许多大规模推荐系统中的重要任务,可自然归结为对含噪矩阵补全中的多个线性形式进行假设检验。然而,由于低秩结构导致的估计条目间存在微妙的偏差-方差权衡与复杂依赖关系,这类问题带来了独特挑战。本文通过引入具有边缘与联合渐近显著性的新统计量,并利用数据分割与对称聚合方案控制错误发现率(FDR),提出了一种克服上述困难的通用方法。理论证明表明:在近乎最优的样本量要求下,该方法能够实现有效的FDR控制并保证统计功效。通过大量数值模拟与真实数据实验,进一步验证了该方法的实际应用价值。