Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t human quality assessment. We demonstrate empirically that standard-coded videos vastly deteriorate the performance of deep vision models. To overcome the deterioration of vision performance, this paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance, while adhering to existing standardization. We demonstrate that our approach better preserves downstream deep vision performance than traditional approaches.
翻译:标准化有损视频编码是几乎所有实际视频处理流水线的核心。码率控制用于使标准编解码器适应不同的网络带宽条件或存储限制。然而,标准视频编解码器(如H.264)及其码率控制模块旨在针对人类质量评估最小化视频失真。我们通过实验证明,标准编码的视频会严重降低深度视觉模型的性能。为克服视觉性能下降的问题,本文提出了首个端到端可学习的深度视频编码控制方法,该方法在遵守现有标准化的同时,兼顾带宽限制与下游深度视觉性能。我们证明,与传统方法相比,我们的方法能更好地保持下游深度视觉性能。