Posterior computation in hierarchical Dirichlet process (HDP) mixture models is an active area of research in nonparametric Bayes inference of grouped data. Existing literature almost exclusively focuses on the Chinese restaurant franchise (CRF) analogy of the marginal distribution of the parameters, which can mix poorly and is known to have a linear complexity with the sample size. A recently developed slice sampler allows for efficient blocked updates of the parameters, but is shown to be statistically unstable in our article. We develop a blocked Gibbs sampler to sample from the posterior distribution of HDP, which produces statistically stable results, is highly scalable with respect to sample size, and is shown to have good mixing. The heart of the construction is to endow the shared concentration parameter with an appropriately chosen gamma prior that allows us to break the dependence of the shared mixing proportions and permits independent updates of certain log-concave random variables in a block. En route, we develop an efficient rejection sampler for these random variables leveraging piece-wise tangent-line approximations.
翻译:在层次狄利克雷过程混合模型的组数据非参数贝叶斯推断中,后验计算是一个活跃的研究领域。现有文献几乎完全集中于参数边际分布的Chinese Restaurant Franchise类比,该方法混合效率差,且已知其计算复杂度随样本量线性增长。近期发展的切片采样器虽能实现参数的高效块更新,但本文研究表明其在统计上不稳定。我们开发了一种块吉布斯采样器用于HDP后验分布采样,该算法产生统计稳定的结果,对样本量具有高度可扩展性,并展现出良好的混合性。该构造的核心在于为共享浓度参数赋予适当选择的伽马先验,这使得我们能够打破共享混合比例的依赖性,并允许块中某些对数凹随机变量的独立更新。在实现过程中,我们利用分段切线近似方法针对这些随机变量开发了高效拒绝采样器。