Bayesian factor models are widely used for dimensionality reduction and pattern discovery in high-dimensional datasets across diverse fields. These models typically focus on imposing priors on factor loading to induce sparsity and improve interpretability. However, factor score, which plays a critical role in individual-level associations with factors, has received less attention and is assumed to have standard multivariate normal distribution. This oversimplification fails to capture the heterogeneity observed in real-world applications. We propose the Sparse Bayesian Factor Model with Mass-Nonlocal Factor Scores (BFMAN), a novel framework that addresses these limitations by introducing a mass-nonlocal prior for factor scores. This prior provides a more flexible posterior distribution that captures individual heterogeneity while assigning positive probability to zero value. The zeros entries in the score matrix, characterize the sparsity, offering a robust and novel approach for determining the optimal number of factors. Model parameters are estimated using a fast and efficient Gibbs sampler. Extensive simulations demonstrate that BFMAN outperforms standard Bayesian sparse factor models in factor recovery, sparsity detection, and score estimation. We apply BFMAN to the Hispanic Community Health Study/Study of Latinos and identify dietary patterns and their associations with cardiovascular outcomes, showcasing the model's ability to uncover meaningful insights in diet.
翻译:贝叶斯因子模型广泛应用于高维数据集的降维与模式发现,涵盖多个学科领域。这些模型通常侧重于对因子载荷施加先验以诱导稀疏性并提升可解释性。然而,因子得分在个体层面与因子的关联中起着关键作用,却较少受到关注,通常被假定服从标准多元正态分布。这种过度简化未能捕捉实际应用中观察到的异质性。我们提出具有质量非局部因子得分的稀疏贝叶斯因子模型(BFMAN),该新颖框架通过为因子得分引入质量非局部先验来解决这些局限性。该先验提供了更灵活的后验分布,既能捕捉个体异质性,又能为零值分配正概率。得分矩阵中的零元素表征了稀疏性,为确定最优因子数量提供了一种稳健且创新的方法。模型参数通过快速高效的吉布斯采样器进行估计。大量模拟实验表明,BFMAN在因子恢复、稀疏性检测和得分估计方面均优于标准贝叶斯稀疏因子模型。我们将BFMAN应用于西班牙裔社区健康研究/拉丁裔研究,识别了饮食模式及其与心血管结局的关联,展示了该模型在饮食领域挖掘有意义洞见的能力。