Reducing the data footprint of visual content via image compression is essential to reduce storage requirements, but also to reduce the bandwidth and latency requirements for transmission. In particular, the use of compressed images allows for faster transfer of data, and faster response times for visual recognition in edge devices that rely on cloud-based services. In this paper, we first analyze the impact of image compression using traditional codecs, as well as recent state-of-the-art neural compression approaches, on three visual recognition tasks: image classification, object detection, and semantic segmentation. We consider a wide range of compression levels, ranging from 0.1 to 2 bits-per-pixel (bpp). We find that for all three tasks, the recognition ability is significantly impacted when using strong compression. For example, for segmentation mIoU is reduced from 44.5 to 30.5 mIoU when compressing to 0.1 bpp using the best compression model we evaluated. Second, we test to what extent this performance drop can be ascribed to a loss of relevant information in the compressed image, or to a lack of generalization of visual recognition models to images with compression artefacts. We find that to a large extent the performance loss is due to the latter: by finetuning the recognition models on compressed training images, most of the performance loss is recovered. For example, bringing segmentation accuracy back up to 42 mIoU, i.e. recovering 82% of the original drop in accuracy.
翻译:降低视觉内容的数据占用是图像压缩的核心目标,这不仅有助于减少存储需求,还能降低传输时的带宽和延迟要求。尤其对于依赖云端服务的边缘设备,使用压缩图像能实现更快的数据传输与响应时间。本文首先分析传统编解码器及最新神经压缩方法在三种视觉识别任务(图像分类、目标检测、语义分割)中对图像压缩的影响。我们测试了从0.1到2比特每像素(bpp)的广泛压缩率范围,发现在所有三项任务中,强压缩会显著影响识别能力。例如,使用评估中最佳压缩模型将图像压缩至0.1 bpp时,语义分割的mIoU从44.5降至30.5。其次,我们进一步测试了性能下降的原因:是压缩图像丢失了关键信息,还是视觉识别模型对含压缩伪影的图像缺乏泛化能力。结果表明,性能损失很大程度上源于后者——通过在压缩训练图像上微调识别模型,大部分性能损失可被恢复。例如,将分割精度恢复至42 mIoU,即恢复了原始精度损失的82%。