Bayesian inference for Dirichlet-Multinomial (DM) models has a long and important history. The concentration parameter $\alpha$ is pivotal in smoothing category probabilities within the multinomial distribution and is crucial for the inference afterward. Due to the lack of a tractable form of its marginal likelihood, $\alpha$ is often chosen ad-hoc, or estimated using approximation algorithms. A constant $\alpha$ often leads to inadequate smoothing of probabilities, particularly for sparse compositional count datasets. In this paper, we introduce a novel class of prior distributions facilitating conjugate updating of the concentration parameter, allowing for full Bayesian inference for DM models. Our methodology is based on fast residue computation and admits closed-form posterior moments in specific scenarios. Additionally, our prior provides continuous shrinkage with its heavy tail and substantial mass around zero, ensuring adaptability to the sparsity or quasi-sparsity of the data. We demonstrate the usefulness of our approach on both simulated examples and on a real-world human microbiome dataset. Finally, we conclude with directions for future research.
翻译:狄利克雷-多项模型(DM)的贝叶斯推理有着悠久而重要的历史。其中的浓度参数$\alpha$在多项分布中对类别概率进行平滑处理时起关键作用,并对后续推理至关重要。由于边际似然函数缺乏可解析形式,$\alpha$通常凭经验选取或用近似算法估计。常数$\alpha$常导致概率平滑不足,尤其对稀疏的组成型计数数据而言。本文提出一类新颖的先验分布,可对浓度参数实现共轭更新,从而支持DM模型的完全贝叶斯推理。该方法基于快速残差计算,在特定场景下可得到闭式后验矩。此外,该先验通过重尾与零附近显著质量实现连续收缩,确保能适应数据的稀疏性或准稀疏性。我们通过模拟实验和真实人类微生物组数据集验证了该方法的有效性,并最终展望了未来研究方向。