Energy-based models (EBMs) have experienced a resurgence within machine learning in recent years, including as a promising alternative for probabilistic regression. However, energy-based regression requires a proposal distribution to be manually designed for training, and an initial estimate has to be provided at test-time. We address both of these issues by introducing a conceptually simple method to automatically learn an effective proposal distribution, which is parameterized by a separate network head. To this end, we derive a surprising result, leading to a unified training objective that jointly minimizes the KL divergence from the proposal to the EBM, and the negative log-likelihood of the EBM. At test-time, we can then employ importance sampling with the trained proposal to efficiently evaluate the learned EBM and produce stand-alone predictions. Furthermore, we utilize our derived training objective to learn mixture density networks (MDNs) with a jointly trained energy-based teacher, consistently outperforming conventional MDN training on four real-world regression tasks within computer vision. Code is available at https://github.com/fregu856/ebms_proposals.
翻译:能量基模型(EBMs)近年来在机器学习领域重新兴起,包括作为概率回归的一种有前景的替代方案。然而,能量基回归需要手动设计提议分布用于训练,并且在测试时需提供初始估计值。我们通过引入一种概念简单的方法来解决这两个问题,该方法可自动学习有效的提议分布,并由独立的网络头部参数化。为此,我们推导出一个令人惊讶的结果,从而得到一个统一的训练目标,该目标同时最小化从提议分布到EBM的KL散度以及EBM的负对数似然。在测试时,我们可以利用训练好的提议分布进行重要性采样,以高效评估学习到的EBM并生成独立预测。此外,我们利用推导出的训练目标来学习混合密度网络(MDNs),并配合联合训练的能量基教师模型,在计算机视觉的四个真实世界回归任务中持续优于传统的MDN训练方法。代码可在 https://github.com/fregu856/ebms_proposals 获取。