It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distribution of the target variable by minimizing the cross-entropy between a target distribution and a flexible histogram prediction. We design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Our results suggest that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information. We then demonstrate the viability of the Histogram Loss in common deep learning applications without a need for costly hyperparameter tuning.
翻译:在回归问题中,即使仅需预测均值,训练神经网络来建模整个分布的做法正日益普遍。这种额外建模通常会带来性能提升,但其背后的原因尚未完全明晰。本文研究了一种最新的回归方法——直方图损失(Histogram Loss),该方法通过最小化目标分布与灵活直方图预测之间的交叉熵来学习目标变量的条件分布。我们设计了理论与实证分析,以阐明性能提升出现的原因、条件,以及损失函数各组成部分对此提升的贡献。研究结果表明,该框架下学习分布的优势主要源于优化过程的改进,而非对额外信息的建模。我们进一步证明了直方图损失在常见深度学习应用中的可行性,且无需高昂的超参数调优成本。