This paper develops an intuitive concept of perfect dependence between two variables of which at least one has a nominal scale. Perfect dependence is attainable for all marginal distributions. It furthermore proposes a set of dependence measures that are 1 if and only if this perfect dependence is satisfied. The advantages of these dependence measures relative to classical dependence measures like contingency coefficients, Goodman-Kruskal's lambda and tau and the so-called uncertainty coefficient are twofold. Firstly, they are defined if one of the variables exhibits continuities. Secondly, they satisfy the property of attainability. That is, they can take all values in the interval [0,1] irrespective of the marginals involved. Both properties are not shared by classical dependence measures which need two discrete marginal distributions and can in some situations yield values close to 0 even though the dependence is strong or even perfect. Additionally, the paper provides a consistent estimator for one of the new dependence measures together with its asymptotic distribution under independence as well as in the general case. This allows to construct confidence intervals and an independence test with good finite sample properties, as a subsequent simulation study shows. Finally, two applications on the dependence between the variables country and income, and country and religion, respectively, illustrate the use of the new measure.
翻译:本文针对至少一个变量具有名义尺度的双变量系统,提出了完美依赖关系的直观概念。该完美依赖关系适用于所有边际分布。此外,本文提出了一组依赖度量,当且仅当满足此完美依赖条件时,其取值为1。相较于列联系数、Goodman-Kruskal's λ与τ系数以及所谓的不确定系数等经典依赖度量,这些新度量具有双重优势:首先,它们在一个变量呈现连续性的情况下仍可定义;其次,它们满足可达性,即无论涉及何种边际分布,其取值均可覆盖区间[0,1]内的所有值。经典依赖度量则不具备这两项特性——它们要求两个离散边际分布,且在部分情况下即使依赖关系很强甚至完美,其取值仍可能接近0。此外,本文针对其中一个新依赖度量给出了具有一致性的估计量,并推导了其在独立情形及一般情况下的渐近分布。如后续模拟研究所证,这为构建置信区间和执行具有良好有限样本性质的独立性检验提供了基础。最后,通过国家与收入变量、国家与宗教变量之间的依赖关系两个应用案例,展示了新度量的实际使用方法。