Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that standard-coded videos vastly deteriorate the performance of deep vision models. To overcome the deterioration of vision performance, this paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance, while adhering to existing standardization. We demonstrate that our approach better preserves downstream deep vision performance than traditional standard video coding.
翻译:标准化有损视频编码几乎是所有实际视频处理流程的核心。码率控制用于使标准编码器适应不同的网络带宽条件或存储限制。然而,标准视频编码器(如H.264)及其码率控制模块旨在最小化视频失真以符合人类质量评估。我们通过实验证明,标准编码视频会显著降低深度视觉模型的性能。为克服视觉性能的下降,本文提出了首个端到端可学习的深度视频编码控制方法,该方法在遵守现有标准化规范的同时,兼顾带宽限制与下游深度视觉性能。我们证明,与传统标准视频编码相比,本方法能更好地保留下游深度视觉性能。