Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions. Additionally, we have implemented a latent refinement process to generate content-aware latent codes. These codes adhere to bit-rate constraints, balance the trade-off between distortion and fidelity, and prioritize bit allocation to regions of greater importance. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression. On CLIC2024 validation set, our approach achieves a 62% bitrate saving compared to MS-ILLM under FID metric.
翻译:近期神经压缩领域的进展在PSNR和MS-SSIM指标上已超越传统编解码器。然而在低比特率条件下,此类方法仍会产生视觉不悦的伪影,如模糊、色偏及纹理丢失,从而损害图像的感知质量。针对这些问题,本研究提出一种面向最优视觉保真度的增强型神经压缩方法。我们采用融合Charbonnier损失、感知损失、风格损失及非二元对抗损失的复合语义集成损失训练模型,以提升图像重建的感知质量。此外,我们实施潜在精化过程生成内容感知的潜在编码,这些编码严格遵循比特率约束,平衡失真与保真度的权衡关系,并将比特资源优先分配给关键区域。实验结果表明,该方法显著提升了神经图像压缩的统计保真度。在CLIC2024验证集上,我们的方法在FID指标下相比MS-ILLM实现了62%的比特率节省。