Recent advancements in neural image codecs (NICs) are of significant compression performance, but limited attention has been paid to their error resilience. These resulting NICs tend to be sensitive to packet losses, which are prevalent in real-time communications. In this paper, we investigate how to elevate the resilience ability of NICs to combat packet losses. We propose ResiComp, a pioneering neural image compression framework with feature-domain packet loss concealment (PLC). Motivated by the inherent consistency between generation and compression, we advocate merging the tasks of entropy modeling and PLC into a unified framework focused on latent space context modeling. To this end, we take inspiration from the impressive generative capabilities of large language models (LLMs), particularly the recent advances of masked visual token modeling (MVTM). During training, we integrate MVTM to mirror the effects of packet loss, enabling a dual-functional Transformer to restore the masked latents by predicting their missing values and conditional probability mass functions. Our ResiComp jointly optimizes compression efficiency and loss resilience. Moreover, ResiComp provides flexible coding modes, allowing for explicitly adjusting the efficiency-resilience trade-off in response to varying Internet or wireless network conditions. Extensive experiments demonstrate that ResiComp can significantly enhance the NIC's resilience against packet losses, while exhibits a worthy trade-off between compression efficiency and packet loss resilience.
翻译:近年来,神经图像编解码器(NICs)在压缩性能方面取得了显著进展,但其抗误码能力却未得到充分关注。现有NICs普遍对数据包丢失较为敏感,而包丢失在实时通信中极为常见。本文研究如何提升NICs的抗包丢失能力。我们提出ResiComp——首个具备特征域包丢失隐藏(PLC)功能的神经图像压缩框架。受生成与压缩任务内在一致性的启发,我们主张将熵建模与PLC任务整合到专注于潜在空间上下文建模的统一框架中。为此,我们从大语言模型(LLMs)强大的生成能力中获得灵感,特别是近期掩码视觉标记建模(MVTM)的进展。在训练过程中,我们引入MVTM来模拟包丢失效应,使双功能Transformer能够通过预测缺失值及其条件概率质量函数来恢复被掩码的潜在表示。ResiComp可同步优化压缩效率与抗损能力。此外,该框架提供灵活的编码模式,允许根据互联网或无线网络状况的变化,显式调整效率与抗损性的权衡关系。大量实验表明,ResiComp能显著增强NICs的抗包丢失能力,同时在压缩效率与包丢失恢复性能之间实现了有价值的权衡。