Surveillance Facial Image Quality Assessment: A Multi-dimensional Dataset and Lightweight Model

Surveillance facial images are often captured under unconstrained conditions, resulting in severe quality degradation due to factors such as low resolution, motion blur, occlusion, and poor lighting. Although recent face restoration techniques applied to surveillance cameras can significantly enhance visual quality, they often compromise fidelity (i.e., identity-preserving features), which directly conflicts with the primary objective of surveillance images -- reliable identity verification. Existing facial image quality assessment (FIQA) predominantly focus on either visual quality or recognition-oriented evaluation, thereby failing to jointly address visual quality and fidelity, which are critical for surveillance applications. To bridge this gap, we propose the first comprehensive study on surveillance facial image quality assessment (SFIQA), targeting the unique challenges inherent to surveillance scenarios. Specifically, we first construct SFIQA-Bench, a multi-dimensional quality assessment benchmark for surveillance facial images, which consists of 5,004 surveillance facial images captured by three widely deployed surveillance cameras in real-world scenarios. A subjective experiment is conducted to collect six dimensional quality ratings, including noise, sharpness, colorfulness, contrast, fidelity and overall quality, covering the key aspects of SFIQA. Furthermore, we propose SFIQA-Assessor, a lightweight multi-task FIQA model that jointly exploits complementary facial views through cross-view feature interaction, and employs learnable task tokens to guide the unified regression of multiple quality dimensions. The experiment results on the proposed dataset show that our method achieves the best performance compared with the state-of-the-art general image quality assessment (IQA) and FIQA methods, validating its effectiveness for real-world surveillance applications.

翻译：监控人脸图像通常在非受控条件下采集，由于低分辨率、运动模糊、遮挡及光照不良等因素导致严重的质量退化。尽管应用于监控摄像头的最新面部修复技术能显著提升视觉质量，但往往会损害保真度（即身份保持特征），这与监控图像的核心目标——可靠身份验证——直接冲突。现有的人脸图像质量评估方法主要聚焦于视觉质量或面向识别的评估，未能兼顾对监控应用至关重要的视觉质量与保真度双重维度。为填补这一空白，我们首次针对监控场景特有的挑战，提出关于监控人脸图像质量评估的综合性研究。具体而言，我们首先构建了SFIQA-Bench——一个面向监控人脸图像的多维质量评估基准数据集，包含由三种广泛部署的监控摄像头在真实场景中采集的5,004张监控人脸图像。通过主观实验收集了涵盖噪声、清晰度、色彩丰富度、对比度、保真度及整体质量六个维度的评分，覆盖了监控人脸图像质量评估的关键方面。进一步，我们提出SFIQA-Assessor——一个轻量级多任务人脸图像质量评估模型，该模型通过跨视角特征交互联合利用互补的面部视角信息，并采用可学习的任务令牌引导多个质量维度的统一回归。在构建数据集上的实验结果表明，相较于最先进的通用图像质量评估与人脸图像质量评估方法，我们的方法取得了最优性能，验证了其在真实监控场景中的有效性。