Grace++: Loss-Resilient Real-Time Video Communication under High Network Latency

In real-time videos, resending any packets, especially in networks with high latency, can lead to stuttering, poor video quality, and user frustration. Despite extensive research, current real-time video systems still use redundancy to handle packet loss, thus compromising on quality in the the absence of packet loss. Since predicting packet loss is challenging, these systems only enhance their resilience to packet loss after it occurs, leaving some frames insufficiently protected against burst packet losses. They may also add too much redundancy even after the packet loss has subsided. We present Grace++, a new real-time video communication system. With Grace++, (i) a video frame can be decoded, as long as any non-empty subset of its packets are received, and (ii) the quality gracefully degrades as more packets are lost, and (iii) approximates that of a standard codec (like H.265) in absence of packet loss. To achieve this, Grace++ encodes and decodes frames by using neural networks (NNs). It uses a new packetization scheme that makes packet loss appear to have the same effect as randomly masking (zeroing) a subset of elements in the NN-encoded output, and the NN encoder and decoder are specially trained to achieve decent quality if a random subset of elements in the NN-encoded output are masked. Using various test videos and real network traces, we show that the quality of Grace++ is slightly lower than H.265 when no packets are lost, but significantly reduces the 95th percentile of frame delay (between encoding a frame and its decoding) by 2x when packet loss occurs compared to other loss-resilient schemes while achieving comparable quality. This is because Grace++ does not require retransmission of packets (unless all packets are lost) or skipping of frames.

翻译：摘要：在实时视频通信中，尤其是在高延迟网络环境下，重传任何数据包都可能导致视频卡顿、画质下降及用户体验受损。尽管已有大量研究，但现有实时视频系统仍普遍采用冗余机制处理丢包，从而在无丢包时牺牲画质。由于丢包预测存在挑战，这些系统仅在丢包发生后增强抗丢包能力，导致部分帧在突发丢包中保护不足；甚至在丢包消退后仍可能过度添加冗余。本文提出Grace++新型实时视频通信系统。其核心特性包括：(i) 只要收到视频帧的任意非空数据包子集即可解码；(ii) 画质随丢包量增加而平滑降级；(iii) 无丢包时画质接近标准编解码器（如H.265）。为实现此目标，Grace++采用神经网络（NN）进行帧编解码。其新型数据包化方案使丢包效果等效于对NN编码输出进行随机掩码（置零），并通过专门训练的NN编码器与解码器确保编码输出中随机子集被掩码时仍能维持可接受画质。基于多种测试视频与真实网络轨迹的实验表明：无丢包时Grace++画质略低于H.265，但发生丢包时，相较于其他抗丢包方案，其帧延迟（从编码完成到解码输出）的第95百分位数降低2倍，同时保持相近画质。这是因为Grace++无需重传数据包（除非全部丢失）或跳过视频帧。