GRACE++: Loss-Resilient Real-Time Video through Neural Codecs

In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements. To counter packet losses without retransmission, two primary strategies are employed -- encoder-based forward error correction (FEC) and decoder-based error concealment. The former encodes data with redundancy before transmission, yet determining the optimal redundancy level in advance proves challenging. The latter reconstructs video from partially received frames, but dividing a frame into independently coded partitions inherently compromises compression efficiency, and the lost information cannot be effectively recovered by the decoder without adapting the encoder. We present a loss-resilient real-time video system called GRACE++, which preserves the user's quality of experience (QoE) across a wide range of packet losses through a new neural video codec. Central to GRACE++'s enhanced loss resilience is its joint training of the neural encoder and decoder under a spectrum of simulated packet losses. In lossless scenarios, GRACE++ achieves video quality on par with conventional codecs (e.g., H.265). As the loss rate escalates, GRACE++ exhibits a more graceful, less pronounced decline in quality, consistently outperforming other loss-resilient schemes. Through extensive evaluation on various videos and real network traces, we demonstrate that GRACE++ reduces undecodable frames by 95% and stall duration by 90% compared with FEC, while markedly boosting video quality over error concealment methods. In a user study with 240 crowdsourced participants and 960 subjective ratings, GRACE++ registers a 38% higher mean opinion score (MOS) than other baselines.

翻译：在实时视频通信中，由于严格的延迟要求，无法通过重传高延迟网络中丢失的数据包来应对丢包问题。为在不重传的情况下应对丢包，主要采用两种策略——基于编码器的前向纠错（FEC）和基于解码器的错误隐藏。前者在传输前以冗余方式编码数据，但预先确定最优冗余级别具有挑战性；后者通过部分接收到的帧重建视频，但将帧分割为独立编码的分区本质上会损害压缩效率，且在不调整编码器的情况下，解码器无法有效恢复丢失信息。我们提出一种名为GRACE++的抗丢包实时视频系统，它通过新型神经视频编解码器在广泛的丢包率范围内保持用户的主观体验质量（QoE）。GRACE++增强抗丢包能力的核心在于，其在模拟的多种丢包场景下联合训练神经编码器和解码器。在无丢包场景中，GRACE++可实现与传统编解码器（如H.265）相当的视频质量。随着丢包率升高，GRACE++展现出更平滑、更轻微的质量下降趋势，持续优于其他抗丢包方案。通过对多种视频和真实网络轨迹的广泛评估，我们证明，相比FEC，GRACE++将不可解码帧减少95%，卡顿时长减少90%，同时显著提升视频质量优于错误隐藏方法。在包含240名众包参与者和960次主观评分的用户研究中，GRACE++的平均意见分（MOS）相比其他基线方法高出38%。