Many statistical applications, such as the Principal Component Analysis, matrix completion, tensor regression and many others, rely on accurate estimation of leading eigenvectors of a matrix. The Davis-Kahan theorem is known to be instrumental for bounding above the distances between matrices $U$ and $\widehat{U}$ of population eigenvectors and their sample versions. While those distances can be measured in various metrics, the recent developments have shown advantages of evaluation of the deviation in the two-to-infinity norm. The purpose of this paper is to develop a toolbox for derivation of upper bounds for the distances between $U$ and $\widehat{U}$ in the two-to-infinity norm for a variety of possible scenarios. Although this problem has been studied by several authors, the difference between this paper and its predecessors is that the upper bounds are obtained under various sets of assumptions. The upper bounds are initially derived with no or mild probabilistic assumptions on the error, and are subsequently refined, when some generic probabilistic assumptions on the errors hold. The paper also provides rectification of the upper bounds in the cases of heavy-tailed or exponentially fast decaying errors. In addition, the paper suggests alternative methods for evaluation of $\widehat{U}$ and, therefore, enables one to compare the resulting accuracies. As an example of an application of the techniques in the paper, we derive sufficient conditions for perfect clustering in a generic setting, and then employ them in various scenarios.
翻译:许多统计应用,例如主成分分析、矩阵补全、张量回归等,都依赖于矩阵主特征向量的精确估计。Davis-Kahan定理在控制总体特征向量矩阵$U$与其样本版本$\widehat{U}$之间的距离上界方面具有重要作用。尽管这些距离可以通过多种度量方式衡量,但近期研究表明,采用无穷范数评估偏差具有显著优势。本文旨在构建一套工具箱,用于推导各类场景下$U$与$\widehat{U}$在无穷范数下的距离上界。尽管已有学者研究过该问题,但本文与既往工作的区别在于:我们针对不同假设条件集给出了上界表达式。本文首先在无误差概率假设或弱概率假设下推导出初始上界,随后在误差满足通用概率假设时对其进行优化。此外,本文还修正了重尾分布或指数快速衰减误差情形下的上界表达式。文章进一步提出了评估$\widehat{U}$的替代方法,从而能够比较不同方法的精度。作为本文技术的应用实例,我们推导了通用设定下实现完美聚类的充分条件,并将其应用于多种场景。