Deep Kernel Principal Component Analysis for Multi-level Feature Learning

Principal Component Analysis (PCA) and its nonlinear extension Kernel PCA (KPCA) are widely used across science and industry for data analysis and dimensionality reduction. Modern deep learning tools have achieved great empirical success, but a framework for deep principal component analysis is still lacking. Here we develop a deep kernel PCA methodology (DKPCA) to extract multiple levels of the most informative components of the data. Our scheme can effectively identify new hierarchical variables, called deep principal components, capturing the main characteristics of high-dimensional data through a simple and interpretable numerical optimization. We couple the principal components of multiple KPCA levels, theoretically showing that DKPCA creates both forward and backward dependency across levels, which has not been explored in kernel methods and yet is crucial to extract more informative features. Various experimental evaluations on multiple data types show that DKPCA finds more efficient and disentangled representations with higher explained variance in fewer principal components, compared to the shallow KPCA. We demonstrate that our method allows for effective hierarchical data exploration, with the ability to separate the key generative factors of the input data both for large datasets and when few training samples are available. Overall, DKPCA can facilitate the extraction of useful patterns from high-dimensional data by learning more informative features organized in different levels, giving diversified aspects to explore the variation factors in the data, while maintaining a simple mathematical formulation.

翻译：主成分分析（PCA）及其非线性扩展核主成分分析（KPCA）广泛应用于科学和工业领域的数据分析与降维。现代深度学习工具取得了显著的实证成功，但深层主成分分析的理论框架仍付阙如。本文提出了一种深度核主成分分析方法（DKPCA），用于提取数据中多个层次最具信息量的成分。该方案能有效识别新型层次变量（称为深度主成分），通过简洁可解释的数值优化捕获高维数据的主要特征。我们将多个KPCA层次的主成分进行耦合，从理论上证明DKPCA在不同层次间建立了前向与后向依赖关系——这在核方法中尚未被探索，但对提取更具信息量的特征至关重要。在多种数据类型上的实验评估表明，与浅层KPCA相比，DKPCA能以更少的主成分捕获更高解释方差的更高效、可解耦的表示。我们证明该方法能够实现有效的层次化数据探索，无论是处理大规模数据集还是少量训练样本，都能分离输入数据的关键生成因子。总体而言，DKPCA通过在不同层次上学习更具信息量的特征，能够促进从高维数据中提取有用模式，以多元视角探索数据中的变异因素，同时保持简洁的数学表达形式。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日