Pearson's correlation coefficient is a popular statistical measure to summarize the strength of association between two continuous variables. It is usually interpreted via its square as percentage of variance of one variable predicted by the other in a linear regression model. It can be generalized for multiple regression via the coefficient of determination, which is not straightforward to interpret in terms of prediction accuracy. In this paper, we propose to assess the prediction accuracy of a linear model via the prediction interval reduction (PIR) by comparing the width of the prediction interval derived from this model with the width of the prediction interval obtained without this model. At the population level, PIR is one-to-one related to the correlation and the coefficient of determination. In particular, a correlation of 0.5 corresponds to a PIR of only 13%. It is also the one's complement of the coefficient of alienation introduced at the beginning of last century. We argue that PIR is easily interpretable and useful to keep in mind how difficult it is to make accurate individual predictions, an important message in the era of precision medicine and artificial intelligence. Different estimates of PIR are compared in the context of a linear model and an extension of the PIR concept to non-linear models is outlined.
翻译:皮尔逊相关系数是衡量两个连续变量关联强度的常用统计量。通常通过其平方值来解释:在线性回归模型中,一个变量被另一个变量预测的方差百分比。该指标可通过决定系数推广至多元回归,但用预测精度解释并不直观。本文提出通过预测区间缩减(PIR)评估线性模型的预测精度——比较该模型导出的预测区间宽度与无模型时获得的预测区间宽度。在总体层面,PIR与相关系数和决定系数存在一一对应关系:特别地,当相关系数为0.5时,PIR仅为13%。该指标也是上世纪初期提出的疏离系数的一补数。我们认为PIR易于解释,有助于铭记精确个体预测的困难程度——这在精准医疗与人工智能时代具有重要启示。在线性模型背景下比较了PIR的不同估计方法,并概述了将PIR概念扩展至非线性模型的思路。