Hand-crafted image quality metrics, such as PSNR and SSIM, are commonly used to evaluate model privacy risk under reconstruction attacks. Under these metrics, reconstructed images that are determined to resemble the original one generally indicate more privacy leakage. Images determined as overall dissimilar, on the other hand, indicate higher robustness against attack. However, there is no guarantee that these metrics well reflect human opinions, which, as a judgement for model privacy leakage, are more trustworthy. In this paper, we comprehensively study the faithfulness of these hand-crafted metrics to human perception of privacy information from the reconstructed images. On 5 datasets ranging from natural images, faces, to fine-grained classes, we use 4 existing attack methods to reconstruct images from many different classification models and, for each reconstructed image, we ask multiple human annotators to assess whether this image is recognizable. Our studies reveal that the hand-crafted metrics only have a weak correlation with the human evaluation of privacy leakage and that even these metrics themselves often contradict each other. These observations suggest risks of current metrics in the community. To address this potential risk, we propose a learning-based measure called SemSim to evaluate the Semantic Similarity between the original and reconstructed images. SemSim is trained with a standard triplet loss, using an original image as an anchor, one of its recognizable reconstructed images as a positive sample, and an unrecognizable one as a negative. By training on human annotations, SemSim exhibits a greater reflection of privacy leakage on the semantic level. We show that SemSim has a significantly higher correlation with human judgment compared with existing metrics. Moreover, this strong correlation generalizes to unseen datasets, models and attack methods.
翻译:手工设计的图像质量指标,如PSNR和SSIM,通常用于评估模型在重建攻击下的隐私风险。在这些指标下,与原始图像相似的重建图像通常表明更多的隐私泄露,而被判定为整体不相似的图像则表示对攻击具有更高的鲁棒性。然而,这些指标能否很好地反映人类意见并无保证,而人类意见作为模型隐私泄露的判断依据更为可靠。本文全面研究了这些手工指标对重建图像隐私信息的人类感知的忠实程度。我们在从自然图像、人脸到细粒度类别的5个数据集上,使用4种现有攻击方法从多种不同分类模型中重建图像,并为每张重建图像邀请多名人类标注者评估其是否可识别。研究表明,手工指标与隐私泄露的人类评价仅存在弱相关性,甚至这些指标本身也常常相互矛盾。这些观察结果提示了当前社区中指标存在的风险。为应对这一潜在风险,我们提出了一种基于学习的度量方法SemSim,用于评估原始图像与重建图像之间的语义相似性。SemSim使用标准三元组损失进行训练,以原始图像为锚点,其可识别的重建图像为正样本,不可识别的重建图像为负样本。通过基于人类标注的训练,SemSim在语义层面上更充分地反映了隐私泄露。我们证明,与现有指标相比,SemSim与人类判断的相关性显著更高。此外,这种强相关性可泛化到未见过的数据集、模型和攻击方法。