Loss of packets in video conferencing often results in poor quality and video freezing. Attempting to retransmit the lost packets is usually not practical due to the requirement for real-time playback. Using Forward Error Correction (FEC) to recover the lost packets is challenging since it is difficult to determine the appropriate level of redundancy. In this paper, we propose a framework called Reparo for creating loss-resilient video conferencing using generative deep learning models. Our approach involves generating missing information when a frame or part of a frame is lost. This generation is conditioned on the data received so far, and the model's knowledge of how people look, dress, and interact in the visual world. Our experiments on publicly available video conferencing datasets show that Reparo outperforms state-of-the-art FEC-based video conferencing in terms of both video quality (measured by PSNR) and video freezes.
翻译:视频会议中的数据包丢失常导致视频质量下降和画面冻结。由于需要实时播放,重传丢失的数据包通常不切实际。使用前向纠错(FEC)恢复丢包同样面临挑战,因为难以确定合适的冗余级别。本文提出一种名为Reparo的框架,利用生成式深度学习模型构建抗丢包视频会议系统。我们的方法是在帧或帧内某部分丢失时,基于已接收数据以及模型对人类在视觉世界中外观、着装和互动模式的认知,生成缺失信息。在公开视频会议数据集上的实验表明,Reparo在视频质量(以PSNR衡量)和视频冻结两方面均优于基于FEC的现有最优视频会议方案。