Several recent studies have reported negative results when using heteroskedastic neural regression models to model real-world data. In particular, for overparameterized models, the mean and variance networks are powerful enough to either fit every single data point (while shrinking the predicted variances to zero), or to learn a constant prediction with an output variance exactly matching every predicted residual (i.e., explaining the targets as pure noise). This paper studies these difficulties from the perspective of statistical physics. We show that the observed instabilities are not specific to any neural network architecture but are already present in a field theory of an overparameterized conditional Gaussian likelihood model. Under light assumptions, we derive a nonparametric free energy that can be solved numerically. The resulting solutions show excellent qualitative agreement with empirical model fits on real-world data and, in particular, prove the existence of phase transitions, i.e., abrupt, qualitative differences in the behaviors of the regressors upon varying the regularization strengths on the two networks. Our work thus provides a theoretical explanation for the necessity to carefully regularize heteroskedastic regression models. Moreover, the insights from our theory suggest a scheme for optimizing this regularization which is quadratically more efficient than the naive approach.
翻译:近期多项研究指出,使用异方差神经回归模型对真实世界数据进行建模时会出现负面结果。具体而言,对于过参数化模型,均值网络与方差网络具备足够强的拟合能力,要么使每个数据点完美拟合(同时将预测方差压缩至零),要么学习一个恒定预测值并使输出方差精确匹配每个预测残差(即把目标值解释为纯噪声)。本文从统计物理视角研究这些困境。我们证明,观测到的不稳定性并不局限于特定神经网络架构,而是已存在于过参数化条件高斯似然模型的场论框架中。在弱假设条件下,我们推导出可通过数值求解的非参数自由能。所得解与真实数据上的经验模型拟合展现出优异的定性一致性,尤其证明相变现象的存在——即通过调控两个网络的正则化强度时,回归器行为会出现突变的定性差异。本研究因此为异方差回归模型需谨慎正则化提供了理论依据。此外,基于理论洞察,我们提出了一种正则化优化方案,其效率比朴素方法提升四倍。