$σ$-PCA: a unified neural model for linear and nonlinear principal component analysis

Linear principal component analysis (PCA), nonlinear PCA, and linear independent component analysis (ICA) -- those are three methods with single-layer autoencoder formulations for learning linear transformations from data. Linear PCA learns orthogonal transformations (rotations) that orient axes to maximise variance, but it suffers from a subspace rotational indeterminacy: it fails to find a unique rotation for axes that share the same variance. Both nonlinear PCA and linear ICA reduce the subspace indeterminacy from rotational to permutational by maximising statistical independence under the assumption of unit variance. The main difference between them is that nonlinear PCA only learns rotations while linear ICA learns not just rotations but any linear transformation with unit variance. The relationship between all three can be understood by the singular value decomposition of the linear ICA transformation into a sequence of rotation, scale, rotation. Linear PCA learns the first rotation; nonlinear PCA learns the second. The scale is simply the inverse of the standard deviations. The problem is that, in contrast to linear PCA, conventional nonlinear PCA cannot be used directly on the data to learn the first rotation, the first being special as it reduces dimensionality and orders by variances. In this paper, we have identified the cause, and as a solution we propose $\sigma$-PCA: a unified neural model for linear and nonlinear PCA as single-layer autoencoders. One of its key ingredients: modelling not just the rotation but also the scale -- the variances. This model bridges the disparity between linear and nonlinear PCA. And so, like linear PCA, it can learn a semi-orthogonal transformation that reduces dimensionality and orders by variances, but, unlike linear PCA, it does not suffer from rotational indeterminacy.

翻译：线性主成分分析(PCA)、非线性PCA以及线性独立成分分析(ICA)是三种使用单层自编码器框架从数据中学习线性变换的方法。线性PCA通过学习正交变换(旋转)来对齐坐标轴以最大化方差，但存在子空间旋转不确定性：当多个坐标轴具有相同方差时，无法确定唯一旋转。非线性PCA和线性ICA均通过假设单位方差来最大化统计独立性，从而将子空间不确定性从旋转降为置换。二者主要区别在于非线性PCA仅学习旋转变换，而线性ICA学习的是具有单位方差的任意线性变换(不仅限于旋转)。通过将线性ICA变换分解为旋转-缩放-旋转序列的奇异值分解，可以理解三者关系：线性PCA学习第一次旋转，非线性PCA学习第二次旋转，而缩放因子即为标准差的倒数。现有问题在于，与线性PCA不同，传统非线性PCA无法直接对数据学习第一次旋转——首次旋转具有降维和按方差排序的特殊作用。本文在厘清问题成因后，提出$\sigma$-PCA这一统一神经网络模型，将线性和非线性PCA统一为单层自编码器。其关键创新在于不仅建模旋转，还建模方差(即缩放因子)，从而弥合线性PCA与非线性PCA的差异。该模型既能像线性PCA一样学习半正交变换实现降维和方差排序，又能克服线性PCA的旋转不确定性缺陷。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

牛津大学最新《计算代数拓扑》笔记书，107页pdf

专知会员服务

44+阅读 · 2022年2月17日

【AAAI2022】面向多标签分类的端到端概率标签特征学习

专知会员服务

32+阅读 · 2022年1月27日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CVPR2021】半监督迁移学习的自适应一致性正则化

专知会员服务

33+阅读 · 2021年3月7日