Factor analysis provides a canonical framework for imposing lower-dimensional structure such as sparse covariance in high-dimensional data. High-dimensional data on the same set of variables are often collected under different conditions, for instance in reproducing studies across research groups. In such cases, it is natural to seek to learn the shared versus condition-specific structure. Existing hierarchical extensions of factor analysis have been proposed, but face practical issues including identifiability problems. To address these shortcomings, we propose a class of SUbspace Factor Analysis (SUFA) models, which characterize variation across groups at the level of a lower-dimensional subspace. We prove that the proposed class of SUFA models lead to identifiability of the shared versus group-specific components of the covariance, and study their posterior contraction properties. Taking a Bayesian approach, these contributions are developed alongside efficient posterior computation algorithms. Our sampler fully integrates out latent variables, is easily parallelizable and has complexity that does not depend on sample size. We illustrate the methods through application to integration of multiple gene expression datasets relevant to immunology.
翻译:因子分析为高维数据施加低维结构(如稀疏协方差)提供了经典框架。同一变量集的高维数据通常在不同条件下收集,例如跨研究团队的可重复性研究中。在此类情形下,自然需要学习共享结构与条件特有结构。现有因子分析的层次扩展已被提出,但面临可识别性等实际问题。为解决这些不足,我们提出了一类子空间因子分析(SUFA)模型,该模型在低维子空间层面刻画组间变异。我们证明了所提出的SUFA模型类能实现协方差中共享与组特有成分的可识别性,并研究了其后验收缩性质。基于贝叶斯方法,这些贡献与高效的后验计算算法同步发展。我们的采样器完全积分掉潜在变量,易于并行化,且计算复杂度与样本量无关。我们通过整合免疫学相关的多个基因表达数据集的应用展示了这些方法。