Various measures in two-way contingency table analysis have been proposed to express the strength of association between row and column variables in contingency tables. Tomizawa et al. (2004) proposed more general measures, including Cram\'er's coefficient, using the power-divergence. In this paper, we propose measures using the $f$-divergence that has a wider class than the power-divergence. Unlike statistical hypothesis tests, these measures provide quantification of the association structure in contingency tables. The contribution of our study is proving that a measure applying a function that satisfies the condition of the $f$-divergence has desirable properties for measuring the strength of association in contingency tables. With this contribution, we can easily construct a new measure using a divergence that has essential properties for the analyst. For example, we conducted numerical experiments with a measure applying the $\theta$-divergence. Furthermore, we can give further interpretation of the association between the row and column variables in the contingency table, which could not be obtained with the conventional one. We also show a relationship between our proposed measures and the correlation coefficient in the bivariate normal distribution of latent variables in the contingency tables.
翻译:在双向列联表分析中,已提出多种度量以表达列联表中行变量与列变量之间关联的强度。Tomizawa等(2004)利用幂散度提出了更广义的度量,包括Cramér系数。本文利用具有比幂散度更广类别的$f$-散度提出相应度量。与统计假设检验不同,这些度量可量化列联表中的关联结构。本研究的贡献在于证明:若应用满足$f$-散度条件的函数,则所得度量在衡量列联表关联强度时具有理想性质。基于此贡献,研究者可便捷地利用具有分析必需特性的散度构建新度量。例如,我们采用$\theta$-散度构建度量并开展数值实验。此外,该度量能对列联表中行变量与列变量之间的关联提供新解释,这是传统方法无法实现的。我们还展示了所提度量与列联表中潜变量二元正态分布相关系数之间的关系。