An Energy-Based Prior for Generative Saliency

We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution. The energy-based prior model is defined on the latent space of a saliency generator network that generates the saliency map based on a continuous latent variables and an observed image. Both the parameters of saliency generator and the energy-based prior are jointly trained via Markov chain Monte Carlo-based maximum likelihood estimation, in which the sampling from the intractable posterior and prior distributions of the latent variables are performed by Langevin dynamics. With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction. Different from existing generative models, which define the prior distribution of the latent variables as a simple isotropic Gaussian distribution, our model uses an energy-based informative prior which can be more expressive in capturing the latent space of the data. With the informative energy-based prior, we extend the Gaussian distribution assumption of generative models to achieve a more representative distribution of the latent space, leading to more reliable uncertainty estimation. We apply the proposed frameworks to both RGB and RGB-D salient object detection tasks with both transformer and convolutional neural network backbones. We further propose an adversarial learning algorithm and a variational inference algorithm as alternatives to train the proposed generative framework. Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps that are consistent with human perception. Results and code are available at https://github.com/JingZhang617/EBMGSOD.

翻译：我们提出了一种新颖的生成式显著性预测框架，该框架采用基于信息性能量模型作为先验分布。该能量先验模型定义在显著性生成网络的潜在空间上，该网络基于连续潜在变量和观测图像生成显著性图。通过基于马尔可夫链蒙特卡罗的最大似然估计联合训练显著性生成器的参数及能量先验模型，其中利用朗之万动力学对难以处理的潜在变量后验分布和先验分布进行采样。通过该生成式显著性模型，我们可以从图像中获得逐像素的不确定性图，指示模型在显著性预测中的置信度。与将潜在变量先验分布定义为简单各向同性高斯分布的现有生成模型不同，我们的模型采用基于能量的信息性先验，该先验能更富表现力地捕捉数据的潜在空间。借助信息性能量先验，我们扩展了生成模型的高斯分布假设，实现了对潜在空间更具代表性的分布，从而获得更可靠的不确定性估计。我们将所提框架应用于基于Transformer和卷积神经网络骨干的RGB及RGB-D显著目标检测任务。此外，我们还提出了对抗学习算法和变分推断算法作为训练所提生成式框架的替代方案。实验结果表明，我们的基于能量先验的生成式显著性模型不仅能实现准确的显著性预测，还能获得与人类感知一致的可靠不确定性图。结果和代码见https://github.com/JingZhang617/EBMGSOD。