Bias-variance decompositions are widely used to understand the generalization performance of machine learning models. While the squared error loss permits a straightforward decomposition, other loss functions - such as zero-one loss or $L_1$ loss - either fail to sum bias and variance to the expected loss or rely on definitions that lack the essential properties of meaningful bias and variance. Recent research has shown that clean decompositions can be achieved for the broader class of Bregman divergences, with the cross-entropy loss as a special case. However, the necessary and sufficient conditions for these decompositions remain an open question. In this paper, we address this question by studying continuous, nonnegative loss functions that satisfy the identity of indiscernibles (zero loss if and only if the two arguments are identical), under mild regularity conditions. We prove that so-called $g$-Bregman or rho-tau divergences are the only such loss functions that have a clean bias-variance decomposition. A $g$-Bregman divergence can be transformed into a standard Bregman divergence through an invertible change of variables. This makes the squared Mahalanobis distance, up to such a variable transformation, the only symmetric loss function with a clean bias-variance decomposition. Consequently, common metrics such as $0$-$1$ and $L_1$ losses cannot admit a clean bias-variance decomposition, explaining why previous attempts have failed. We also examine the impact of relaxing the restrictions on the loss functions and how this affects our results.
翻译:偏差-方差分解被广泛用于理解机器学习模型的泛化性能。虽然平方误差损失允许一种直接的分解,但其他损失函数——例如0-1损失或$L_1$损失——要么无法将偏差和方差之和等于期望损失,要么依赖于缺乏有意义的偏差和方差基本性质的定义。最近的研究表明,对于更广泛的Bregman散度类(以交叉熵损失作为特例)可以实现清晰的分解。然而,这些分解的充分必要条件仍然是一个开放性问题。在本文中,我们通过研究满足不可区分性恒等式(当且仅当两个参数相同时损失为零)的连续非负损失函数,在温和的正则性条件下,解决了这个问题。我们证明,所谓的$g$-Bregman散度或rho-tau散度是唯一具有清晰偏差-方差分解的此类损失函数。$g$-Bregman散度可以通过可逆的变量变换转化为标准的Bregman散度。这使得平方马氏距离(在变量变换的意义下)成为唯一具有清晰偏差-方差分解的对称损失函数。因此,常见的度量如0-1损失和$L_1$损失不可能允许清晰的偏差-方差分解,这解释了先前尝试失败的原因。我们还考察了放松对损失函数的限制如何影响我们的结果。