We introduce EGIC, an enhanced generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. EGIC is based on two novel building blocks: i) OASIS-C, a conditional pre-trained semantic segmentation-guided discriminator, which provides both spatially and semantically-aware gradient feedback to the generator, conditioned on the latent image distribution, and ii) Output Residual Prediction (ORP), a retrofit solution for multi-realism image compression that allows control over the synthesis process by adjusting the impact of the residual between an MSE-optimized and GAN-optimized decoder output on the GAN-based reconstruction. Together, EGIC forms a powerful codec, outperforming state-of-the-art diffusion and GAN-based methods (e.g., HiFiC, MS-ILLM, and DIRAC-100), while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight, and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.
翻译:我们提出EGIC,一种增强型生成式图像压缩方法,能够通过单一模型高效遍历失真-感知曲线。EGIC基于两个新颖构建模块:i)OASIS-C——一种条件式预训练语义分割引导判别器,可在隐式图像分布条件下为生成器提供空间与语义感知的梯度反馈;ii)输出残差预测(ORP)——一种面向多真实感图像压缩的改造方案,通过调节均方误差优化解码器与生成对抗网络优化解码器输出之间的残差对生成对抗网络重建结果的影响,实现对合成过程的控制。EGIC结合二者形成强大编解码器,性能优于当前先进的扩散模型与生成对抗网络方法(如HiFiC、MS-ILLM和DIRAC-100),同时在失真端与VTM-20.0几乎持平。EGIC实现简单、极其轻量化且具备优异的插值特性,使其成为面向低比特率范围实际应用的有力候选方案。