Dirichlet distributions are probability measures on the unit simplex. They are often used as prior distributions in modeling categorical data, such as in topic analysis of text data. Motivated by this application, we consider Monte Carlo estimation of expectations $\mathbb{E}[\exp(nH(θ))]$, where $θ$ has a Dirichlet distribution, $H$ is a real-valued function, and $n$ is a parameter. We develop variance reduction techniques particularly designed to work well for large $n$. Our analysis is guided by the Laplace method for approximating integrals, which we extend to fit our problem setting. We develop an importance sampling method that achieves a near-optimal asymptotic relative error. We use related ideas to select a provably effective control variate. We illustrate these results through their application in topic analysis.
翻译:Dirichlet分布是定义在单位单纯形上的概率测度,常被用作类别型数据建模中的先验分布,例如文本数据的主题分析。受此应用启发,我们研究期望值 $\mathbb{E}[\exp(nH(θ))]$ 的蒙特卡罗估计,其中 $θ$ 服从Dirichlet分布,$H$ 为实值函数,$n$ 为参数。我们开发了针对大 $n$ 场景特别有效的方差缩减技术。我们的分析以拉普拉斯积分近似方法为指导,并将其扩展到适应我们的问题场景。我们提出了一种重要采样方法,实现了接近最优的渐近相对误差。我们利用相关思想选择了一个可证明有效的控制变量。我们通过主题分析中的应用实例来展示这些结果。