Principal Component Analysis (PCA) aims to find subspaces spanned by the so-called principal components that best represent the variance in the dataset. The deflation method is a popular meta-algorithm that sequentially finds individual principal components, starting from the most important ones and working towards the less important ones. However, as deflation proceeds, numerical errors from the imprecise estimation of principal components propagate due to its sequential nature. This paper mathematically characterizes the error propagation of the inexact Hotelling's deflation method. We consider two scenarios: $i)$ when the sub-routine for finding the leading eigenvector is abstract and can represent various algorithms; and $ii)$ when power iteration is used as the sub-routine. In the latter case, the additional directional information from power iteration allows us to obtain a tighter error bound than the sub-routine agnostic case. For both scenarios, we explicitly characterize how the errors progress and affect subsequent principal component estimations.
翻译:主轴分析旨在寻找由所谓主轴张成的子空间,以最优地表示数据集中的方差。收缩法是一种流行的元算法,它从最重要的主轴开始,依次寻找各个主轴,逐步转向较不重要的主轴。然而,随着收缩过程的进行,由于主轴估计不精确而产生的数值误差会因其顺序性质而传播。本文从数学上刻画了非精确霍特林收缩方法的误差传播。我们考虑两种情形:$i)$ 当寻找主导特征向量的子程序是抽象的,可以代表各种算法时;以及 $ii)$ 当使用幂迭代作为子程序时。在后一种情形中,幂迭代提供的额外方向信息使我们能够获得比子程序无关情形更紧的误差界。对于这两种情形,我们明确刻画了误差如何进展并影响后续的主轴估计。