In two-way contingency tables under an asymmetric situation, where the row and column variables are defined as explanatory and response variables, respectively, quantifying the extent to which the explanatory variable contributes to predicting the response variable is important. One quantification method is the association measure, which indicates the degree of association in a range from $0$ to $1$. Among various measures that have been proposed, those based on proportional reduction in error (PRE) are particularly notable for their simplicity and intuitive interpretation. These measures, including Goodman-Kruskal's lambda proposed in 1954, are widely implemented in statistical software such as R and SAS and remain extensively used. However, a well-known limitation of PRE measures is their potential to return a value of $0$ despite no independence. This issue arises because the measures are constructed based solely on the maximum joint and marginal probabilities, failing to make full use of the information available in the contingency table. To address this problem, we propose an extension of PRE measures designed for the proportional reduction in error with multiple categories. The properties of the proposed measures are examined, and their utility is demonstrated through numerical experiments. The results suggest their potential as practical tools in applied statistics.
翻译:在非对称情境的二维列联表中,当行变量和列变量分别被定义为解释变量和响应变量时,量化解释变量对预测响应变量的贡献程度至关重要。一种量化方法是关联度量,其通过从$0$到$1$的范围表示关联程度。在已提出的多种度量中,基于误差比例缩减(PRE)的方法因其简洁性和直观解释性而尤为突出。此类度量(包括1954年提出的Goodman-Kruskal lambda)已广泛集成于R、SAS等统计软件中并持续得到应用。然而,PRE度量存在一个公认的局限性:即使变量间非独立,其计算结果也可能为$0$。这一问题的根源在于该类度量仅基于最大联合概率和边际概率构建,未能充分利用列联表中的全部信息。针对该问题,我们提出一种面向多类别误差比例缩减的扩展PRE度量。通过数值实验检验了所提度量的性质并验证其实用性,结果表明其具有作为应用统计学实用工具的潜力。