Posterior computation in hierarchical Dirichlet process (HDP) mixture models is an active area of research in nonparametric Bayes inference of grouped data. Existing literature almost exclusively focuses on the Chinese restaurant franchise (CRF) analogy of the marginal distribution of the parameters, which can mix poorly and has a quadratic complexity with the sample size. A recently developed slice sampler allows for efficient blocked updates of the parameters, but is shown to be statistically unstable in our article. We develop a blocked Gibbs sampler that employs a truncated approximation of the underlying random measures to sample from the posterior distribution of HDP, which produces statistically stable results, is highly scalable with respect to sample size, and is shown to have good mixing. The heart of the construction is to endow the shared concentration parameter with an appropriately chosen gamma prior that allows us to break the dependence of the shared mixing proportions and permits independent updates of certain log-concave random variables in a block. En route, we develop an efficient rejection sampler for these random variables leveraging piece-wise tangent-line approximations.
翻译:分层狄利克雷过程(HDP)混合模型中的后验计算是分组数据非参数贝叶斯推断中的一个活跃研究领域。现有文献几乎完全集中于参数边际分布的中餐馆特许经营(CRF)类比,该方法可能混合效果较差,且复杂度随样本量呈二次增长。近期开发的切片采样器允许对参数进行高效的块更新,但本文表明其在统计上不稳定。我们开发了一种块吉布斯采样器,采用底层随机测度的截断近似来从HDP的后验分布中采样,该方法能产生统计稳定的结果,对样本量具有高度可扩展性,并展现出良好的混合性能。该构建的核心在于为共享浓度参数赋予适当选择的伽玛先验,这使得我们能够打破共享混合比例间的依赖性,并允许对块中某些对数凹随机变量进行独立更新。在此过程中,我们利用分段切线近似为这些随机变量开发了一种高效的拒绝采样器。