Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
翻译:给定多个项目之间的成对比较,如何对其进行排序以使得排序结果与观测一致?这一问题被称为排序聚合,在体育、推荐系统及其他网络应用中有着广泛应用。由于寻找使失配最小化的全局排序(即Kemeny优化)通常是NP困难的,我们聚焦于该排序问题的Erdős–Rényi异常值(ERO)模型。在该模型中,每个成对比较都是真实得分差的带噪副本。我们研究了基于未归一化与归一化数据矩阵的谱排序算法。关键在于理解这些算法从观测数据中恢复每个项目潜在得分的性能。这归结为推导未归一化/归一化数据矩阵的顶特征向量与其总体对应物之间的逐项扰动误差界。通过使用留一法技术,我们给出了特征向量更尖锐的$\ell_{\infty}$-范数扰动界,并推导了每个项目最大位移的误差界,仅需$\Omega(n\log n)$个样本。我们的理论分析在样本复杂度上改进了现有最优结果,数值实验也证实了这些理论发现。