Linear principal component analysis (PCA), nonlinear PCA, and linear independent component analysis (ICA) -- those are three methods with single-layer autoencoder formulations for learning linear transformations from data. Linear PCA learns orthogonal transformations (rotations) that orient axes to maximise variance, but it suffers from a subspace rotational indeterminacy: it fails to find a unique rotation for axes that share the same variance. Both nonlinear PCA and linear ICA reduce the subspace indeterminacy from rotational to permutational by maximising statistical independence under the assumption of unit variance. The main difference between them is that nonlinear PCA only learns rotations while linear ICA learns not just rotations but any linear transformation with unit variance. The relationship between all three can be understood by the singular value decomposition of the linear ICA transformation into a sequence of rotation, scale, rotation. Linear PCA learns the first rotation; nonlinear PCA learns the second. The scale is simply the inverse of the standard deviations. The problem is that, in contrast to linear PCA, conventional nonlinear PCA cannot be used directly on the data to learn the first rotation, the first being special as it reduces dimensionality and orders by variances. In this paper, we have identified the cause, and as a solution we propose $\sigma$-PCA: a unified neural model for linear and nonlinear PCA as single-layer autoencoders. One of its key ingredients: modelling not just the rotation but also the scale -- the variances. This model bridges the disparity between linear and nonlinear PCA. And so, like linear PCA, it can learn a semi-orthogonal transformation that reduces dimensionality and orders by variances, but, unlike linear PCA, it does not suffer from rotational indeterminacy.
翻译:线性主成分分析(PCA)、非线性PCA以及线性独立成分分析(ICA)是三种使用单层自编码器框架从数据中学习线性变换的方法。线性PCA通过学习正交变换(旋转)来对齐坐标轴以最大化方差,但存在子空间旋转不确定性:当多个坐标轴具有相同方差时,无法确定唯一旋转。非线性PCA和线性ICA均通过假设单位方差来最大化统计独立性,从而将子空间不确定性从旋转降为置换。二者主要区别在于非线性PCA仅学习旋转变换,而线性ICA学习的是具有单位方差的任意线性变换(不仅限于旋转)。通过将线性ICA变换分解为旋转-缩放-旋转序列的奇异值分解,可以理解三者关系:线性PCA学习第一次旋转,非线性PCA学习第二次旋转,而缩放因子即为标准差的倒数。现有问题在于,与线性PCA不同,传统非线性PCA无法直接对数据学习第一次旋转——首次旋转具有降维和按方差排序的特殊作用。本文在厘清问题成因后,提出$\sigma$-PCA这一统一神经网络模型,将线性和非线性PCA统一为单层自编码器。其关键创新在于不仅建模旋转,还建模方差(即缩放因子),从而弥合线性PCA与非线性PCA的差异。该模型既能像线性PCA一样学习半正交变换实现降维和方差排序,又能克服线性PCA的旋转不确定性缺陷。