This paper investigates the efficacy of jointly optimizing content-specific post-processing filters to adapt a human oriented video/image codec into a codec suitable for machine vision tasks. By observing that artifacts produced by video/image codecs are content-dependent, we propose a novel training strategy based on competitive learning principles. This strategy assigns training samples to filters dynamically, in a fuzzy manner, which further optimizes the winning filter on the given sample. Inspired by simulated annealing optimization techniques, we employ a softmax function with a temperature variable as the weight allocation function to mitigate the effects of random initialization. Our evaluation, conducted on a system utilizing multiple post-processing filters within a Versatile Video Coding (VVC) codec framework, demonstrates the superiority of content-specific filters trained with our proposed strategies, specifically, when images are processed in blocks. Using VVC reference software VTM 12.0 as the anchor, experiments on the OpenImages dataset show an improvement in the BD-rate reduction from -41.3% and -44.6% to -42.3% and -44.7% for object detection and instance segmentation tasks, respectively, compared to independently trained filters. The statistics of the filter usage align with our hypothesis and underscore the importance of jointly optimizing filters for both content and reconstruction quality. Our findings pave the way for further improving the performance of video/image codecs.
翻译:本文研究了通过联合优化内容特定的后处理滤波器,将面向人类的视频/图像编解码器适配为适用于机器视觉任务的编解码器的有效性。通过观察发现,视频/图像编解码器产生的伪影具有内容依赖性,我们提出了一种基于竞争学习原理的新型训练策略。该策略以模糊方式动态地将训练样本分配给滤波器,并进一步针对给定样本优化获胜滤波器。受模拟退火优化技术的启发,我们采用带温度变量的softmax函数作为权重分配函数,以减轻随机初始化的影响。我们在通用视频编码(VVC)编解码器框架内使用多个后处理滤波器的系统上进行了评估,结果表明,采用我们提出的策略训练的内容特定滤波器具有优越性,特别是在以块为单位处理图像时。以VVC参考软件VTM 12.0为基准,在OpenImages数据集上的实验显示,与独立训练的滤波器相比,对象检测和实例分割任务的BD-rate降低分别从-41.3%和-44.6%提升至-42.3%和-44.7%。滤波器使用统计数据与我们的假设一致,并强调了为内容和重建质量联合优化滤波器的重要性。我们的发现为进一步提升视频/图像编解码器的性能铺平了道路。