We propose an information-theoretic knowledge distillation approach for the compression of generative adversarial networks, which aims to maximize the mutual information between teacher and student networks via a variational optimization based on an energy-based model. Because the direct computation of the mutual information in continuous domains is intractable, our approach alternatively optimizes the student network by maximizing the variational lower bound of the mutual information. To achieve a tight lower bound, we introduce an energy-based model relying on a deep neural network to represent a flexible variational distribution that deals with high-dimensional images and consider spatial dependencies between pixels, effectively. Since the proposed method is a generic optimization algorithm, it can be conveniently incorporated into arbitrary generative adversarial networks and even dense prediction networks, e.g., image enhancement models. We demonstrate that the proposed algorithm achieves outstanding performance in model compression of generative adversarial networks consistently when combined with several existing models.
翻译:我们提出了一种基于信息论的知识蒸馏方法,用于生成对抗网络的压缩。该方法通过基于能量模型的变分优化,旨在最大化教师网络与学生网络之间的互信息。由于连续域中互信息的直接计算难以实现,我们的方法通过最大化互信息的变分下界来替代性优化学生网络。为了实现紧凑的下界,我们引入了一种基于深度神经网络的能量模型,以表示能够有效处理高维图像并考虑像素间空间依赖关系的灵活变分分布。由于所提出的方法是一种通用优化算法,它可以方便地集成到任意生成对抗网络甚至密集预测网络(如图像增强模型)中。我们证明,在与多种现有模型结合时,所提算法在生成对抗网络模型压缩中持续取得了优异性能。