Learning image classification and image generation using the same set of network parameters is a challenging problem. Recent advanced approaches perform well in one task often exhibit poor performance in the other. This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. Unlike a conventional classifier that outputs a label given an image (i.e., a conditional distribution $p(y|\mathbf{x})$), the forward pass in EGC is a classifier that outputs a joint distribution $p(\mathbf{x},y)$, enabling an image generator in its backward pass by marginalizing out the label $y$. This is done by estimating the energy and classification probability given a noisy image in the forward pass, while denoising it using the score function estimated in the backward pass. EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church, while achieving superior classification accuracy and robustness against adversarial attacks on CIFAR-10. This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters. We believe that EGC bridges the gap between discriminative and generative learning.
翻译:使用同一组网络参数同时学习图像分类与图像生成是一项具有挑战性的问题。现有先进方法在某一任务上表现优异时,往往在另一任务上表现欠佳。本文提出一种基于能量的分类与生成模型EGC,该模型通过单一神经网络即可在两任务中实现卓越性能。与传统分类器根据图像输出标签(即条件分布$p(y|\mathbf{x})$)不同,EGC的前向过程是一个输出联合分布$p(\mathbf{x},y)$的分类器,通过在后向过程中对标签$y$进行边缘化处理实现图像生成。具体而言,前向过程通过估计含噪图像的能量与分类概率,而逆向过程则利用估计的得分函数对图像进行去噪。在ImageNet-1k、CelebA-HQ和LSUN Church数据集上,EGC取得了与最先进方法相媲美的生成效果,同时在CIFAR-10数据集上展现出优越的分类精度与对抗攻击鲁棒性。本工作首次成功实现了使用单一网络参数集同时优异完成两项任务,我们认为EGC弥合了判别式学习与生成式学习之间的鸿沟。