With advances in image recognition technology based on deep learning, automatic video analysis by Artificial Intelligence is becoming more widespread. As the amount of video used for image recognition increases, efficient compression methods for such video data are necessary. In general, when the image quality deteriorates due to image encoding, the image recognition accuracy also falls. Therefore, in this paper, we propose a neural-network-based approach to improve image recognition accuracy, especially the object detection accuracy by applying post-processing to the encoded video. Versatile Video Coding (VVC) will be used for the video compression method, since it is the latest video coding method with the best encoding performance. The neural network is trained using the features of YOLO-v7, the latest object detection model. By using VVC as the video coding method and YOLO-v7 as the detection model, high object detection accuracy is achieved even at low bit rates. Experimental results show that the combination of the proposed method and VVC achieves better coding performance than regular VVC in object detection accuracy.
翻译:随着基于深度学习的图像识别技术的进步,基于人工智能的视频自动分析日益普及。随着用于图像识别的视频数据量增加,对此类视频数据的高效压缩方法变得必要。通常,当图像编码导致图像质量下降时,图像识别精度也会随之降低。因此,本文提出一种基于神经网络的方法,通过对编码视频进行后处理来提升图像识别精度,特别是目标检测精度。视频压缩方法将采用多功能视频编码(VVC),因其是最新的视频编码方法且具有最佳的编码性能。该神经网络利用最新目标检测模型YOLO-v7的特征进行训练。通过将VVC作为视频编码方法、YOLO-v7作为检测模型,即使在低码率下也能实现高目标检测精度。实验结果表明,所提方法与VVC的结合在目标检测精度方面比常规VVC取得了更好的编码性能。