High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.
翻译:高质量人脸图像是保障监控与安全场景中自动人脸识别系统稳定性与可靠性的必要条件。然而,由于传输或存储限制,海量人脸数据通常在分析前被压缩处理。压缩后的图像可能丢失强身份信息,导致人脸识别系统性能下降。本文首次尝试研究人脸识别系统的恰可察觉差异(JND),其可定义为系统无法感知的最大失真量。具体而言,我们构建了一个包含3530张原始图像及基于多功能视频编码(VVC)标准先进参考编解码软件(VTM-15.0)生成的137670张压缩图像的JND数据集。进而,我们提出一种新型JND预测模型,可为人脸识别系统直接推断JND图像。特别地,为在不损伤鲁棒身份信息的前提下实现冗余最大化去除,我们采用集成多特征提取与注意力机制特征分解模块的编码器,通过自监督学习逐步将人脸特征分解为两个无关分量——身份特征与残差特征。随后,将残差特征输入解码器生成残差图,最终通过原始图像与残差图相减得到预测JND图。实验结果表明,与当前最优JND模型相比,所提模型在JND图预测精度上表现更优,且相较VTM-15.0能在保持人脸识别系统性能的同时实现更多比特量节省。