Measuring dependence between two events, or equivalently between two binary random variables, amounts to expressing the dependence structure inherent in a $2\times 2$ contingency table in a real number between $-1$ and $1$. Countless such dependence measures exist, but there is little theoretical guidance on how they compare and on their advantages and shortcomings. Thus, practitioners might be overwhelmed by the problem of choosing a suitable measure. We provide a set of natural desirable properties that a proper dependence measure should fulfill. We show that Yule's Q and the little-known Cole coefficient are proper, while the most widely-used measures, the phi coefficient and all contingency coefficients, are improper. They have a severe attainability problem, that is, even under perfect dependence they can be very far away from $-1$ and $1$, and often differ substantially from the proper measures in that they understate strength of dependence. The structural reason is that these are measures for equality of events rather than of dependence. We derive the (in some instances non-standard) limiting distributions of the measures and illustrate how asymptotically valid confidence intervals can be constructed. In a case study on drug consumption we demonstrate how misleading conclusions may arise from the use of improper dependence measures.
翻译:度量两个事件之间的依赖性,等价于度量两个二元随机变量之间的依赖性,其本质是将$2\times 2$列联表中蕴含的依赖结构用一个介于-1到1之间的实数表示。此类依赖性度量方法不胜枚举,但关于它们之间的比较及其优缺点,目前缺乏系统的理论指导。因此,实践者在选择合适度量方法时可能无所适从。我们提出了一组恰当的依赖性度量应具备的自然理想性质。研究表明,Yule's Q和鲜为人知的Cole系数是恰当的,而最广泛使用的phi系数及所有列联系数则是不恰当的。这些不当度量存在严重的可达性问题——即便在完全依赖的情况下,其取值也可能远偏离-1和1,且往往与恰当度量的结果存在显著差异,表现为低估了依赖强度。其结构根源在于这些度量衡量的是事件间的等同性而非依赖性。我们推导了这些度量的极限分布(某些情形下为非标准分布),并展示了如何构建渐近有效的置信区间。通过一项药物消费案例研究,我们证明了使用不当依赖性度量可能得出误导性结论。