Principal components computed via PCA (principal component analysis) are traditionally used to reduce dimensionality in genomic data or to correct for population stratification. In this paper, we explore the penalized eigenvalue problem (PEP) which reformulates the computation of the first eigenvector as an optimization problem and adds an L1 penalty constraint. The contribution of our article is threefold. First, we extend PEP by applying Nesterov smoothing to the original LASSO-type L1 penalty. This allows one to compute analytical gradients which enable faster and more efficient minimization of the objective function associated with the optimization problem. Second, we demonstrate how higher order eigenvectors can be calculated with PEP using established results from singular value decomposition (SVD). Third, using data from the 1000 Genome Project dataset, we empirically demonstrate that our proposed smoothed PEP allows one to increase numerical stability and obtain meaningful eigenvectors. We further investigate the utility of the penalized eigenvector approach over traditional PCA.
翻译:主成分分析(PCA)计算得到的主成分传统上用于降低基因组数据的维度或校正群体分层。本文探讨了惩罚特征值问题(PEP),该问题将第一特征向量的计算重构为一个优化问题,并添加了L1惩罚约束。本文的贡献包含三方面。首先,我们对原始LASSO型L1惩罚应用涅斯捷罗夫平滑,从而扩展了PEP。该方法能够计算解析梯度,使得与优化问题相关的目标函数的最小化过程更加快速高效。其次,我们展示了如何利用奇异值分解(SVD)的已有结果,通过PEP计算高阶特征向量。第三,利用1000基因组计划数据集,我们通过实验证明所提出的平滑PEP能够提升数值稳定性并获得有意义的特征向量。我们进一步研究了惩罚特征向量方法相对于传统PCA的实用性。