Image Coding for Machines (ICM) is an image compression technique for image recognition. This technique is essential due to the growing demand for image recognition AI. In this paper, we propose a method for ICM that focuses on encoding and decoding only the edge information of object parts in an image, which we call SA-ICM. This is an Learned Image Compression (LIC) model trained using edge information created by Segment Anything. Our method can be used for image recognition models with various tasks. SA-ICM is also robust to changes in input data, making it effective for a variety of use cases. Additionally, our method provides benefits from a privacy point of view, as it removes human facial information on the encoder's side, thus protecting one's privacy. Furthermore, this LIC model training method can be used to train Neural Representations for Videos (NeRV), which is a video compression model. By training NeRV using edge information created by Segment Anything, it is possible to create a NeRV that is effective for image recognition (SA-NeRV). Experimental results confirm the advantages of SA-ICM, presenting the best performance in image compression for image recognition. We also show that SA-NeRV is superior to ordinary NeRV in video compression for machines. Code is available at https://github.com/final-0/SA-ICM.
翻译:机器图像编码(ICM)是一种面向图像识别的图像压缩技术。随着图像识别AI需求的日益增长,该技术变得至关重要。本文提出一种ICM方法,专注于编码和解码图像中物体部分的边缘信息,我们称之为SA-ICM。这是一个利用Segment Anything生成的边缘信息进行训练的学习型图像压缩(LIC)模型。我们的方法可适用于多种任务的图像识别模型。SA-ICM对输入数据的变化具有鲁棒性,适用于多样化的应用场景。此外,该方法从隐私保护角度具有优势,因其在编码端移除人脸信息,从而保护个人隐私。进一步地,该LIC模型训练方法可用于训练神经视频表示(NeRV)——一种视频压缩模型。通过使用Segment Anything生成的边缘信息训练NeRV,可以构建适用于图像识别的NeRV模型(SA-NeRV)。实验结果证实了SA-ICM的优势,在面向图像识别的图像压缩中展现出最佳性能。我们还证明SA-NeRV在面向机器的视频压缩方面优于普通NeRV。代码发布于https://github.com/final-0/SA-ICM。