In contingency table analysis, one is interested in testing whether a model of interest (e.g., the independent or symmetry model) holds using goodness-of-fit tests. When the null hypothesis where the model is true is rejected, the interest turns to the degree to which the probability structure of the contingency table deviates from the model. Many indexes have been studied to measure the degree of the departure, such as the Yule coefficient and Cram\'er coefficient for the independence model, and Tomizawa's symmetry index for the symmetry model. The inference of these indexes is performed using sample proportions, which are estimates of cell probabilities, but it is well-known that the bias and mean square error (MSE) values become large without a sufficient number of samples. To address the problem, this study proposes a new estimator for indexes using Bayesian estimators of cell probabilities. Assuming the Dirichlet distribution for the prior of cell probabilities, we asymptotically evaluate the value of MSE when plugging the posterior means of cell probabilities into the index, and propose an estimator of the index using the Dirichlet hyperparameter that minimizes the value. Numerical experiments show that when the number of samples per cell is small, the proposed method has smaller values of bias and MSE than other methods of correcting estimation accuracy. We also show that the values of bias and MSE are smaller than those obtained by using the uniform and Jeffreys priors.
翻译:在列联表分析中,研究者常通过拟合优度检验考察特定模型(如独立模型或对称模型)是否成立。当原假设(模型成立)被拒绝时,关注点转向列联表概率结构与模型偏离的程度。学者们提出了多种度量偏离程度的指数,例如针对独立模型的尤尔系数和克拉默系数,以及针对对称模型的富泽对称性指数。这些指数的推断依赖于样本比例(即单元格概率的估计值),但众所周知,当样本量不足时,其偏差和均方误差会显著增大。为解决该问题,本研究提出了一种基于贝叶斯估计器的新型指数估计方法。假设单元格概率的先验分布服从狄利克雷分布,我们渐近评估了将单元格概率后验均值代入指数后的均方误差值,并提出了通过最小化该均方误差的狄利克雷超参数来估计指数的方案。数值实验表明,当每个单元格的样本量较小时,所提方法的偏差和均方误差均优于其他校正估计精度的方法。此外,与使用均匀先验和杰弗里斯先验的方法相比,所提方法的偏差和均方误差值更小。