While most existing neural image compression (NIC) and neural video compression (NVC) methodologies have achieved remarkable success, their optimization is primarily focused on human visual perception. However, with the rapid development of artificial intelligence, many images and videos will be used for various machine vision tasks. Consequently, such existing compression methodologies cannot achieve competitive performance in machine vision. In this work, we introduce an efficient adaptive compression (EAC) method tailored for both human perception and multiple machine vision tasks. Our method involves two key modules: 1), an adaptive compression mechanism, that adaptively selects several subsets from latent features to balance the optimizations for multiple machine vision tasks (e.g., segmentation, and detection) and human vision. 2), a task-specific adapter, that uses the parameter-efficient delta-tuning strategy to stimulate the comprehensive downstream analytical networks for specific machine vision tasks. By using the above two modules, we can optimize the bit-rate costs and improve machine vision performance. In general, our proposed EAC can seamlessly integrate with existing NIC (i.e., Ball\'e2018, and Cheng2020) and NVC (i.e., DVC, and FVC) methods. Extensive evaluation on various benchmark datasets (i.e., VOC2007, ILSVRC2012, VOC2012, COCO, UCF101, and DAVIS) shows that our method enhances performance for multiple machine vision tasks while maintaining the quality of human vision.
翻译:尽管现有的神经图像压缩(NIC)与神经视频压缩(NVC)方法已取得显著成功,但其优化主要聚焦于人类视觉感知。然而,随着人工智能的快速发展,大量图像与视频将用于各类机器视觉任务。因此,现有压缩方法在机器视觉任务中难以取得有竞争力的性能。本文提出一种面向人类感知与多种机器视觉任务的高效自适应压缩(EAC)方法。我们的方法包含两个关键模块:1)自适应压缩机制,该机制从潜在特征中自适应地选择若干子集,以平衡多种机器视觉任务(如分割与检测)与人类视觉的优化目标;2)任务特定适配器,该适配器采用参数高效的增量调优策略,以激发针对特定机器视觉任务的下游综合分析网络。通过使用上述两个模块,我们能够优化码率成本并提升机器视觉性能。总体而言,所提出的EAC方法能够无缝集成到现有NIC(即Ballé2018与Cheng2020)与NVC(即DVC与FVC)方法中。在多个基准数据集(即VOC2007、ILSVRC2012、VOC2012、COCO、UCF101与DAVIS)上的广泛评估表明,我们的方法在保持人类视觉质量的同时,有效提升了多种机器视觉任务的性能。