Packet loss during video conferencing often leads to poor quality and video freezing. Attempting to retransmit lost packets is often impractical due to the need for real-time playback. Employing Forward Error Correction (FEC) for recovering the lost packets is challenging as it is difficult to determine the appropriate redundancy level. To address these issues, we introduce Reparo -- a loss-resilient video conferencing framework based on generative deep learning models. Our approach involves generating missing information when a frame or part of a frame is lost. This generation is conditioned on the data received thus far, taking into account the model's understanding of how people and objects appear and interact within the visual realm. Experimental results, using publicly available video conferencing datasets, demonstrate that Reparo outperforms state-of-the-art FEC-based video conferencing solutions in terms of both video quality (measured through PSNR, SSIM, and LPIPS) and the occurrence of video freezes.
翻译:视频会议中的丢包常导致画质下降与视频冻结。由于实时播放需求,重传丢失数据包往往不可行。采用前向纠错(FEC)恢复丢包亦面临挑战——难以确定合适的冗余级别。针对上述问题,我们提出Reparo——基于生成式深度学习模型的抗丢包视频会议框架。本方案的核心思路是在帧或部分帧丢失时,根据已接收数据生成缺失信息,并融入模型对人物、物体在视觉场景中形态与交互规律的认知。基于公开视频会议数据集的实验表明,Reparo在视频质量(通过PSNR、SSIM、LPIPS评估)及视频冻结发生率方面均优于现有基于FEC的最优视频会议方案。