Many statistical applications, such as the Principal Component Analysis, matrix completion, tensor regression and many others, rely on accurate estimation of leading eigenvectors of a matrix. The Davis-Khan theorem is known to be instrumental for bounding above the distances between matrices $U$ and $\widehat{U}$ of population eigenvectors and their sample versions. While those distances can be measured in various metrics, the recent developments showed advantages of evaluation of the deviation in the 2-to-infinity norm. The purpose of this paper is to provide upper bounds for the distances between $U$ and $\widehat{U}$ in the two-to-infinity norm for a variety of possible scenarios and competitive approaches. Although this problem has been studied by several authors, the difference between this paper and its predecessors is that the upper bounds are obtained with no or mild probabilistic assumptions on the error distributions. Those bounds are subsequently refined, when some generic probabilistic assumptions on the errors hold. In addition, the paper provides alternative methods for evaluation of $\widehat{U}$ and, therefore, enables one to compare the resulting accuracies. As an example of an application of the results in the paper, we derive sufficient conditions for perfect clustering in a generic setting, and then employ them in various scenarios.
翻译:许多统计应用,如主成分分析、矩阵补全、张量回归等,都依赖于对矩阵前导特征向量的准确估计。众所周知,Davis-Kahan定理可用于界定总体特征向量矩阵$U$与其样本版本$\widehat{U}$之间距离的上界。虽然这些距离可以用多种度量来衡量,但近期研究显示了使用二到无穷范数评估偏差的优势。本文旨在针对多种可能场景和竞争性方法,提供$U$与$\widehat{U}$之间在二到无穷范数下距离的上界。尽管该问题已被多位学者研究,本文与先前工作的不同之处在于,所得上界无需或仅需对误差分布施加温和的概率假设。当误差满足某些通用概率假设时,这些界可进一步细化。此外,本文提供了评估$\widehat{U}$的替代方法,从而能够比较所得结果的精度。作为文中结论的应用示例,我们推导了通用设定下实现完美聚类的充分条件,并将其应用于多种场景。