Penalized Principal Component Analysis using Nesterov Smoothing

Principal components computed via PCA (principal component analysis) are traditionally used to reduce dimensionality in genomic data or to correct for population stratification. In this paper, we explore the penalized eigenvalue problem (PEP) which reformulates the computation of the first eigenvector as an optimization problem and adds an L1 penalty constraint. The contribution of our article is threefold. First, we extend PEP by applying Nesterov smoothing to the original LASSO-type L1 penalty. This allows one to compute analytical gradients which enable faster and more efficient minimization of the objective function associated with the optimization problem. Second, we demonstrate how higher order eigenvectors can be calculated with PEP using established results from singular value decomposition (SVD). Third, using data from the 1000 Genome Project dataset, we empirically demonstrate that our proposed smoothed PEP allows one to increase numerical stability and obtain meaningful eigenvectors. We further investigate the utility of the penalized eigenvector approach over traditional PCA.

翻译：主成分分析（PCA）计算得到的主成分传统上用于降低基因组数据的维度或校正群体分层。本文探讨了惩罚特征值问题（PEP），该问题将第一特征向量的计算重构为一个优化问题，并添加了L1惩罚约束。本文的贡献包含三方面。首先，我们对原始LASSO型L1惩罚应用涅斯捷罗夫平滑，从而扩展了PEP。该方法能够计算解析梯度，使得与优化问题相关的目标函数的最小化过程更加快速高效。其次，我们展示了如何利用奇异值分解（SVD）的已有结果，通过PEP计算高阶特征向量。第三，利用1000基因组计划数据集，我们通过实验证明所提出的平滑PEP能够提升数值稳定性并获得有意义的特征向量。我们进一步研究了惩罚特征向量方法相对于传统PCA的实用性。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日