Empirical Bayes Covariance Decomposition, and a solution to the Multiple Tuning Problem in Sparse PCA

Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.

翻译：稀疏主成分分析作为一种提升主成分分析可解释性与稳健性的方法被提出。然而，稀疏PCA在实际应用中的推广受限于难以调节控制不同主成分稀疏度的多个超参数（即"多重调参问题"）。本文提出利用经验贝叶斯方法解决该多重调参问题。我们首先引入数据矩阵$\mathbf{X}$的惩罚主成分分析通用框架，该框架将现有部分稀疏PCA方法作为特例包含其中。研究表明该框架同样能导出协方差（或Gram）矩阵$\mathbf{X}^T\mathbf{X}$的惩罚分解形式。我们进一步提出这些惩罚问题的经验贝叶斯版本，其中惩罚项由先验分布决定，这些先验分布通过最大化似然函数从数据中估计而非交叉验证确定。由此产生的"经验贝叶斯协方差分解"为稀疏PCA的多重调参问题提供了理论严谨且高效的解决方案，并可直接扩展以纳入其他结构假设（如非负PCA）。我们通过模拟实验和真实数据示例验证了该方法的有效性。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日