Measuring dependence between two events, or equivalently between two binary random variables, amounts to expressing the dependence structure inherent in a $2\times 2$ contingency table in a real number between $-1$ and $1$. Countless such dependence measures exist, but there is little theoretical guidance on how they compare and on their advantages and shortcomings. Thus, practitioners might be overwhelmed by the problem of choosing a suitable measure. We provide a set of natural desirable properties that a proper dependence measure should fulfill. We show that Yule's Q and the little-known Cole coefficient are proper, while the most widely-used measures, the phi coefficient and all contingency coefficients, are improper. They have a severe attainability problem, that is, even under perfect dependence they can be very far away from $-1$ and $1$, and often differ substantially from the proper measures in that they understate strength of dependence. The structural reason is that these are measures for equality of events rather than of dependence. We derive the (in some instances non-standard) limiting distributions of the measures and illustrate how asymptotically valid confidence intervals can be constructed. In a case study on drug consumption we demonstrate how misleading conclusions may arise from the use of improper dependence measures.
翻译:度量两个事件之间的相依性,等价于度量两个二元随机变量之间的相依性,其本质是将 $2\times 2$ 列联表中蕴含的相依结构表达为一个介于 $-1$ 和 $1$ 之间的实数。虽然存在无数种此类相依性度量,但关于它们如何比较以及各自的优缺点,却鲜有理论指导。因此,实践者可能会在选择合适度量时感到无所适从。我们提出了一套合理的、理想的属性,一个恰当的相依性度量应当满足这些属性。我们证明了 Yule's Q 系数和鲜为人知的 Cole 系数是恰当的度量,而最广泛使用的度量——phi 系数和所有列联系数——则是不恰当的。它们存在严重的可达性问题,即即使在完全相依的情况下,其值也可能与 $-1$ 和 $1$ 相距甚远,并且常常与恰当的度量存在显著差异,因为它们低估了相依性的强度。其结构性原因在于,这些度量实际上是针对事件的等同性,而非事件的相依性。我们推导了这些度量的(在某些情况下是非标准的)极限分布,并说明了如何构建渐近有效的置信区间。在一项关于药物消费的案例研究中,我们展示了使用不恰当的相依性度量可能导致误导性结论。