$σ$-PCA: a building block for neural learning of identifiable linear transformations

Linear principal component analysis (PCA) learns (semi-)orthogonal transformations by orienting the axes to maximize variance. Consequently, it can only identify orthogonal axes whose variances are clearly distinct, but it cannot identify the subsets of axes whose variances are roughly equal. It cannot eliminate the subspace rotational indeterminacy: it fails to disentangle components with equal variances (eigenvalues), resulting, in each eigen subspace, in randomly rotated axes. In this paper, we propose $\sigma$-PCA, a method that (1) formulates a unified model for linear and nonlinear PCA, the latter being a special case of linear independent component analysis (ICA), and (2) introduces a missing piece into nonlinear PCA that allows it to eliminate, from the canonical linear PCA solution, the subspace rotational indeterminacy -- without whitening the inputs. Whitening, a preprocessing step which converts the inputs into unit-variance inputs, has generally been a prerequisite step for linear ICA methods, which meant that conventional nonlinear PCA could not necessarily preserve the orthogonality of the overall transformation, could not directly reduce dimensionality, and could not intrinsically order by variances. We offer insights on the relationship between linear PCA, nonlinear PCA, and linear ICA -- three methods with autoencoder formulations for learning special linear transformations from data, transformations that are (semi-)orthogonal for PCA, and arbitrary unit-variance for ICA. As part of our formulation, nonlinear PCA can be seen as a method that maximizes both variance and statistical independence, lying in the middle between linear PCA and linear ICA, serving as a building block for learning linear transformations that are identifiable.

翻译：线性主成分分析（PCA）通过调整坐标轴方向以最大化方差来学习（半）正交变换。因此，它只能识别方差明显不同的正交轴，而无法识别方差大致相等的轴子集。它无法消除子空间旋转不确定性：无法解耦具有相等方差（特征值）的分量，导致每个特征子空间中的轴随机旋转。本文提出σ-PCA方法，其（1）构建了线性与非线性PCA的统一模型（后者是线性独立成分分析（ICA）的特例），（2）为非线性PCA引入关键组件，使其能够在不进行输入白化的前提下，从经典线性PCA解中消除子空间旋转不确定性。白化作为将输入转换为单位方差输入的预处理步骤，通常是线性ICA方法的必要前提，这意味着传统非线性PCA可能无法保持整体变换的正交性、无法直接降维、也无法按方差进行内在排序。我们深入探讨了线性PCA、非线性PCA与线性ICA之间的关系——这三种方法均具有自编码器形式，用于从数据中学习特殊的线性变换：PCA学习（半）正交变换，ICA学习任意单位方差变换。在我们的框架中，非线性PCA可视为同时最大化方差与统计独立性的方法，介于线性PCA与线性ICA之间，成为学习可识别线性变换的构建模块。