This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994). This type of network is often used as a building block for uncertainty estimation methods in a regression setting, for instance Concrete dropout (Gal et al., 2017) and Deep Ensembles (Lakshminarayanan et al., 2017). Specifically, an MVE network assumes that the data is produced from a normal distribution with a mean function and variance function. The MVE network outputs a mean and variance estimate and optimizes the network parameters by minimizing the negative loglikelihood. In this paper, we discuss two points: firstly, the convergence difficulties reported in recent work can be relatively easily prevented by following the recommendation from the original authors that a warm-up period should be used. During this period, only the mean is optimized assuming a fixed variance. This recommendation is often not used in practice. We experimentally demonstrate how essential this step is. We also examine if keeping the mean estimate fixed after the warm-up leads to different results than estimating both the mean and the variance simultaneously after the warm-up. We do not observe a substantial difference. Secondly, we propose a novel improvement of the MVE network: separate regularization of the mean and the variance estimate. We demonstrate, both on toy examples and on a number of benchmark UCI regression data sets, that following the original recommendations and the novel separate regularization can lead to significant improvements.
翻译:本文聚焦于均值方差估计网络(MVE网络)的最优实现(Nix and Weigend, 1994)。该网络常作为回归场景下不确定性估计方法的构建模块,例如Concrete dropout(Gal等,2017)和Deep Ensembles(Lakshminarayanan等,2017)。具体而言,MVE网络假设数据由具有均值函数和方差函数的正态分布生成,通过输出均值和方差估计值,并最小化负对数似然来优化网络参数。本文讨论两个要点:其一,近期研究报道的收敛困难可通过遵循原始作者建议的预热期(warm-up)相对轻易避免——该阶段仅优化固定方差下的均值估计,而这一建议在实践中常被忽略。我们通过实验验证该步骤的关键性,并探究预热期后固定均值估计与同时优化均值和方差两种策略的结果差异,未观察到显著区别。其二,我们提出MVE网络的新改进:对均值估计与方差估计分别进行正则化。在玩具示例及多个UCI回归基准数据集上的实验表明,遵循原始建议并结合新型分离正则化方法可带来显著性能提升。