Recent advancements in neural image codecs (NICs) are of significant compression performance, but limited attention has been paid to their error resilience. These resulting NICs tend to be sensitive to packet losses, which are prevalent in real-time communications. In this paper, we investigate how to elevate the resilience ability of NICs to combat packet losses. We propose ResiComp, a pioneering neural image compression framework with feature-domain packet loss concealment (PLC). Motivated by the inherent consistency between generation and compression, we advocate merging the tasks of entropy modeling and PLC into a unified framework focused on latent space context modeling. To this end, we take inspiration from the impressive generative capabilities of large language models (LLMs), particularly the recent advances of masked visual token modeling (MVTM). During training, we integrate MVTM to mirror the effects of packet loss, enabling a dual-functional Transformer to restore the masked latents by predicting their missing values and conditional probability mass functions. Our ResiComp jointly optimizes compression efficiency and loss resilience. Moreover, ResiComp provides flexible coding modes, allowing for explicitly adjusting the efficiency-resilience trade-off in response to varying Internet or wireless network conditions. Extensive experiments demonstrate that ResiComp can significantly enhance the NIC's resilience against packet losses, while exhibits a worthy trade-off between compression efficiency and packet loss resilience.
翻译:近年来,神经图像编解码器(NICs)在压缩性能方面取得了显著进展,但其抗误码能力受到的关注有限。现有的NICs往往对数据包丢失较为敏感,而这在实时通信中普遍存在。本文研究如何提升NICs对抗数据包丢失的鲁棒性。我们提出了ResiComp——一种具有特征域包丢失隐藏(PLC)功能的开创性神经图像压缩框架。受生成与压缩之间内在一致性的启发,我们主张将熵建模与PLC任务统一整合到专注于潜在空间上下文建模的框架中。为此,我们从大语言模型(LLMs)强大的生成能力中汲取灵感,特别是近期掩码视觉标记建模(MVTM)的进展。在训练过程中,我们引入MVTM来模拟数据包丢失的影响,使一个双功能Transformer能够通过预测被掩码潜在特征的缺失值及其条件概率质量函数来恢复这些特征。我们的ResiComp联合优化了压缩效率与抗损能力。此外,ResiComp提供灵活的编码模式,允许根据互联网或无线网络条件的变化,显式调整效率与鲁棒性之间的权衡。大量实验表明,ResiComp能显著提升NIC对抗数据包丢失的鲁棒性,同时在压缩效率与包丢失恢复能力之间实现了有价值的平衡。