Kling-Gupta linear regression

Although the Kling-Gupta efficiency ($\mathrm{KGE}$) is widely adopted for model evaluation in hydrology, its properties as a statistical estimator remain unexplored. Investigating these properties is necessary because parameter estimation and forecast evaluation are inherently linked. To address this, we formalize the negatively oriented Kling-Gupta loss $L_\mathrm{KG} = (1 - \mathrm{KGE})^2$ within an extremum estimation framework (equivalent to maximizing $\mathrm{KGE}$) and analyze its behavior in multiple linear regression. We establish explicit formulas for the parameter estimates, showing that Kling-Gupta linear regression scales the ordinary least squares (OLS) coefficient vector by a variance-inflation factor governed by the sample variances and covariances of the predictors and the response. We show that Kling-Gupta linear regression predictions replicate the sample variance of the response on the training set, in contrast to the variance reduction inherent to OLS, while both estimators maintain the sample mean of the observations and achieve the same sample correlation between the predictions and the response. We show analytically that no single estimator can simultaneously maximize both the Nash-Sutcliffe efficiency $\mathrm{NSE}$ and $\mathrm{KGE}$: the OLS estimator attains the maximum possible $\mathrm{NSE}$ but not the maximum $\mathrm{KGE}$, while the Kling-Gupta estimator maximizes $\mathrm{KGE}$ at the cost of $\mathrm{NSE}$. We prove the almost sure convergence of the Kling-Gupta estimator to well-defined population limits and express those limits algebraically. Furthermore, we evaluate the training and test set performance metrics for both estimators, demonstrating that for each estimator the metrics on the training set and on an independent test set converge asymptotically to identical limits (though the limits differ between OLS and Kling-Gupta regression).

翻译：尽管Kling-Gupta效率（$\mathrm{KGE}$）在水文学模型评估中被广泛采用，但其作为统计估计量的性质仍未得到探索。由于参数估计与预测评估内在地相互关联，研究这些性质至关重要。为此，我们基于极值估计框架（等价于最大化$\mathrm{KGE}$）将经过负向化处理的Kling-Gupta损失$L_\mathrm{KG} = (1 - \mathrm{KGE})^2$形式化，并分析了其在多元线性回归中的行为。我们建立了参数估计的显式公式，表明Kling-Gupta线性回归通过一个由预测变量与响应变量的样本方差和协方差决定的方差膨胀因子，对普通最小二乘（OLS）系数向量进行缩放。我们证明，与OLS固有的方差缩减特性相反，Kling-Gupta线性回归的预测结果在训练集上复现了响应变量的样本方差，而两种估计量均保持观测值的样本均值，并在预测值与响应变量之间实现相同的样本相关性。我们通过分析证明，没有任何单一估计量能同时最大化Nash-Sutcliffe效率（$\mathrm{NSE}$）和$\mathrm{KGE}$：OLS估计量达到最大可能的$\mathrm{NSE}$，但无法达到最大的$\mathrm{KGE}$；而Kling-Gupta估计量以牺牲$\mathrm{NSE}$为代价最大化$\mathrm{KGE}$。我们证明了Kling-Gupta估计量以概率1收敛到定义良好的总体极限，并以代数形式表达了这些极限。此外，我们评估了两种估计量的训练集和测试集性能指标，表明对于每种估计量，训练集和独立测试集上的指标渐近收敛到相同的极限（尽管OLS与Kling-Gupta回归的极限不同）。