Sparse principal component analysis (SPCA) is a popular tool for dimensionality reduction in high-dimensional data. However, there is still a lack of theoretically justified Bayesian SPCA methods that can scale well computationally. One of the major challenges in Bayesian SPCA is selecting an appropriate prior for the loadings matrix, considering that principal components are mutually orthogonal. We propose a novel parameter-expanded coordinate ascent variational inference (PX-CAVI) algorithm. This algorithm utilizes a spike and slab prior, which incorporates parameter expansion to cope with the orthogonality constraint. Besides comparing to two popular SPCA approaches, we introduce the PX-EM algorithm as an EM analogue to the PX-CAVI algorithm for comparison. Through extensive numerical simulations, we demonstrate that the PX-CAVI algorithm outperforms these SPCA approaches, showcasing its superiority in terms of performance. We study the posterior contraction rate of the variational posterior, providing a novel contribution to the existing literature. The PX-CAVI algorithm is then applied to study a lung cancer gene expression dataset. The R package VBsparsePCA with an implementation of the algorithm is available on the Comprehensive R Archive Network (CRAN).
翻译:稀疏主成分分析(SPCA)是高维数据降维中的常用工具。然而,目前仍缺乏在理论上得到验证且具备良好计算可扩展性的贝叶斯SPCA方法。贝叶斯SPCA的主要挑战之一在于,需为主成分彼此正交的载荷矩阵选择合适的先验分布。我们提出了一种新颖的参数扩展坐标上升变分推断(PX-CAVI)算法。该算法利用尖峰和板状先验,并通过参数扩展处理正交性约束。除了与两种流行的SPCA方法进行比较外,我们还引入了PX-EM算法作为PX-CAVI算法的EM对应方法进行对比。通过大量数值模拟,我们证明PX-CAVI算法在性能上优于这些SPCA方法,展现出其优越性。我们研究了变分后验的后验收缩率,为现有文献提供了新的贡献。该算法随后被应用于研究肺癌基因表达数据集。实现该算法的R包VBsparsePCA已在综合R档案网络(CRAN)上发布。