In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize similar bounds for the Beta distribution obtained in the seminal paper Alfers and Dinges [1984]. Additionally, our results can be considered a sharp non-asymptotic version of the inverse of Sanov's theorem studied by Ganesh and O'Connell [1999] in the Bayesian setting. Based on these results, we derive new deviation bounds for the Dirichlet process posterior means with application to Bayesian bootstrap. Finally, we apply our estimates to the analysis of the Multinomial Thompson Sampling (TS) algorithm in multi-armed bandits and significantly sharpen the existing regret bounds by making them independent of the size of the arms distribution support.
翻译:本文推导了狄利克雷随机变量加权和的尖锐非渐近偏差界。这些界基于加权狄利克雷和密度函数的一种新颖积分表示。该表示使我们能够利用几何与复分析方法获得和分布的类高斯近似。我们的结果推广了Alfers与Dinges [1984]开创性论文中关于贝塔分布的类似界。此外,在贝叶斯框架下,我们的结果可视为Ganesh与O'Connell [1999]所研究的Sanov定理逆形式的尖锐非渐近版本。基于这些结果,我们推导了狄利克雷过程后验均值的新偏差界,并将其应用于贝叶斯自助法。最后,我们将估计量应用于多臂老虎机中的多项汤普森采样算法分析,显著优化了现有遗憾界,使其与臂分布支撑集大小无关。