Statistical inference in high-dimensional settings is challenging when standard unregularized methods are employed. In this work, we focus on the case of multiple correlated proportions for which we develop a Bayesian inference framework. For this purpose, we construct an $m$-dimensional Beta distribution from a $2^m$-dimensional Dirichlet distribution, building on work by Olkin and Trikalinos (2015). This readily leads to a multivariate Beta-binomial model for which simple update rules from the common Dirichlet-multinomial model can be adopted. From the frequentist perspective, this approach amounts to adding pseudo-observations to the data and allows a joint shrinkage estimation of mean vector and covariance matrix. For higher dimensions ($m > 10$), the extensive model based on $2^m$ parameters starts to become numerically infeasible. To counter this problem, we utilize a reduced parametrisation which has only $1 + m(m + 1)/2$ parameters describing first and second order moments. A copula model can then be used to approximate the (posterior) multivariate Beta distribution. A natural inference goal is the construction of multivariate credible regions. The properties of different credible regions are assessed in a simulation study in the context of investigating the accuracy of multiple binary classifiers. It is shown that the extensive and copula approach lead to a (Bayes) coverage probability very close to the target level. In this regard, they outperform credible regions based on a normal approximation of the posterior distribution, in particular for small sample sizes. Additionally, they always lead to credible regions which lie entirely in the parameter space which is not the case when the normal approximation is used.
翻译:在高维设定下采用标准非正则化方法进行统计推断具有挑战性。本研究针对多个相关比例的情形,构建了一个贝叶斯推断框架。为此,我们基于Olkin与Trikalinos(2015)的研究,从$2^m$维狄利克雷分布构建出$m$维Beta分布。这直接导出了一个多元Beta-二项模型,该模型可采用常见狄利克雷-多项模型的简单更新规则。从频率学派的视角看,该方法等价于向数据添加伪观测值,并允许对均值向量和协方差矩阵进行联合收缩估计。对于更高维度($m > 10$),基于$2^m$个参数的完整模型开始出现数值计算不可行的问题。为解决此问题,我们采用了一种仅包含$1 + m(m + 1)/2$个参数的简化参数化方案,这些参数描述一阶矩和二阶矩。随后可使用copula模型来近似(后验)多元Beta分布。一个自然的推断目标是构建多元可信域。在评估多个二分类器准确度的背景下,我们通过模拟研究评估了不同可信域的性质。研究表明,完整模型方法与copula方法产生的(贝叶斯)覆盖概率非常接近目标水平。在这方面,它们优于基于后验分布正态近似的可信域方法,特别是在小样本情况下。此外,这两种方法始终能产生完全位于参数空间内的可信域,而使用正态近似时则无法保证这一点。