Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of $L_p$ regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.
翻译:神经图像压缩在率失真性能上已超越传统先进编解码器(H.266/VVC),但仍面临计算复杂度高以及需为不同率失真权衡分别构建独立模型的问题。本文提出一种高效的单一模型变比特率编解码器EVC,该方案在输入768×512图像时可实现30 FPS的实时处理速度,且率失真性能仍优于VVC。通过进一步降低编码器与解码器复杂度,我们的轻量模型甚至能在1920×1080分辨率输入下达到30 FPS。为弥合不同容量模型间的性能差距,我们精心设计掩码衰减机制,该机制可自动将大模型参数迁移至小模型。此外,本文提出一种新型稀疏正则化损失函数,以缓解$L_p$正则化的固有缺陷。该算法使中等模型与轻量模型的性能差距分别显著缩小50%和30%。最后,我们倡导在神经图像压缩中采用可扩展编码器,其编码复杂度可动态调节以满足不同时延需求。通过多次衰减大型编码器,我们渐进式地压缩残差表征。掩码衰减与残差表征学习共同显著提升了可扩展编码器的率失真性能。相关代码开源至https://github.com/microsoft/DCVC。