Latent Dirichlet Allocation (LDA) is a foundational model for discovering latent thematic structure in discrete data, but its Dirichlet prior cannot represent the rich correlations and hierarchical relationships often present among topics. We introduce the framework of Latent Dirichlet-Tree Allocation (LDTA), a generalization of LDA that replaces the Dirichlet prior with an arbitrary Dirichlet-Tree (DT) distribution. LDTA preserves LDA's generative structure but enables expressive, tree-structured priors over topic proportions. To perform inference, we develop universal mean-field variational inference and Expectation Propagation, providing tractable updates for all DT. We reveal the vectorized nature of the two inference methods through theoretical development, and perform fully vectorized, GPU-accelerated implementations. The resulting framework substantially expands the modeling capacity of LDA while maintaining scalability and computational efficiency.
翻译:潜在狄利克雷分配(LDA)是发现离散数据中潜在主题结构的基础模型,但其狄利克雷先验无法表征主题间常存在的丰富相关性及层次关系。本文提出潜在狄利克雷树分配(LDTA)框架,该框架通过任意狄利克雷树(DT)分布替代狄利克雷先验,实现了对LDA的泛化。LDTA保留了LDA的生成结构,同时支持对主题比例建立具有表达力的树结构先验分布。为执行推断,我们开发了通用均值场变分推断与期望传播方法,为所有DT分布提供了可处理的更新规则。通过理论推导揭示两种推断方法的向量化本质,并实现了完全向量化、GPU加速的算法实现。该框架在保持可扩展性与计算效率的同时,显著拓展了LDA的建模能力。