Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing devices. In this paper, we introduce a dual-branch network for portrait image quality assessment (PIQA), which can effectively address how the salient person and the background of a portrait image influence its visual quality. Specifically, we utilize two backbone networks (\textit{i.e.,} Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it. To enhance the quality-aware feature representation of the backbones, we pre-train them on the large-scale video quality assessment dataset LSVQ and the large-scale facial image quality assessment dataset GFIQA. Additionally, we leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features. Finally, we concatenate these features and regress them into quality scores via a multi-perception layer (MLP). We employ the fidelity loss to train the model via a learning-to-rank manner to mitigate inconsistencies in quality scores in the portrait image quality assessment dataset PIQ. Experimental results demonstrate that the proposed model achieves superior performance in the PIQ dataset, validating its effectiveness. The code is available at \url{https://github.com/sunwei925/DN-PIQA.git}.
翻译:人像图像通常由显著人物与多样化背景组成。随着移动设备与图像处理技术的发展,用户可以随时随地便捷地拍摄人像图像。然而,这些图像的质量可能因不利环境条件、欠佳摄影技术及低质量拍摄设备导致的退化而受损。本文提出了一种用于人像图像质量评估(PIQA)的双分支网络,该网络能够有效解决显著人物与背景如何影响人像图像视觉质量的关键问题。具体而言,我们利用两个骨干网络(即Swin Transformer-B)分别从完整人像图像及其裁剪得到的面部图像中提取质量感知特征。为增强骨干网络的质量感知特征表示能力,我们在大规模视频质量评估数据集LSVQ和大规模面部图像质量评估数据集GFIQA上对其进行了预训练。此外,我们利用图像场景分类与质量评估模型LIQE捕获质量感知与场景特定特征作为辅助特征。最后,我们将这些特征进行拼接,并通过多层感知器(MLP)回归得到质量分数。我们采用保真度损失,以学习排序方式训练模型,以缓解人像图像质量评估数据集PIQ中质量分数的不一致性。实验结果表明,所提模型在PIQ数据集上取得了优越性能,验证了其有效性。代码发布于\url{https://github.com/sunwei925/DN-PIQA.git}。