Most video platforms provide video streaming services with different qualities, and the quality of the services is usually adjusted by the resolution of the videos. So high-resolution videos need to be downsampled for compression. In order to solve the problem of video coding at different resolutions, we propose a rate-guided arbitrary rescaling network (RARN) for video resizing before encoding. To help the RARN be compatible with standard codecs and generate compression-friendly results, an iteratively optimized transformer-based virtual codec (TVC) is introduced to simulate the key components of video encoding and perform bitrate estimation. By iteratively training the TVC and the RARN, we achieved 5%-29% BD-Rate reduction anchored by linear interpolation under different encoding configurations and resolutions, exceeding the previous methods on most test videos. Furthermore, the lightweight RARN structure can process FHD (1080p) content at real-time speed (91 FPS) and obtain a considerable rate reduction.
翻译:大多数视频平台提供不同质量的视频流服务,服务质量通常通过视频分辨率进行调整。因此,高分辨率视频需要降采样以进行压缩。为解决不同分辨率下的视频编码问题,我们提出了一种码率引导的任意缩放网络(RARN),用于编码前的视频尺寸调整。为使RARN与标准编解码器兼容并生成有利于压缩的结果,我们引入了一种基于Transformer的迭代优化虚拟编解码器(TVC),用于模拟视频编码的关键组件并进行码率估计。通过迭代训练TVC和RARN,我们在不同编码配置和分辨率下,以线性插值为基准实现了5%-29%的BD-Rate降低,在大多数测试视频上优于先前方法。此外,轻量化的RARN结构能够以实时速度(91 FPS)处理FHD(1080p)内容,并实现显著的码率降低。