Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy. To investigate this privacy leakage, we introduce the first Class Attribute Inference Attack (CAIA), which leverages recent advances in text-to-image synthesis to infer sensitive attributes of individual classes in a black-box setting, while remaining competitive with related white-box attacks. Our extensive experiments in the face recognition domain show that CAIA can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender, and racial appearance, which are not part of the training labels. Interestingly, we demonstrate that adversarial robust models are even more vulnerable to such privacy leakage than standard models, indicating that a trade-off between robustness and privacy exists.
翻译:基于神经网络的图像分类器是计算机视觉任务中的强大工具,但它们会无意中泄露其类别的敏感属性信息,引发隐私担忧。为探究此类隐私泄露,我们首次提出类属性推断攻击(CAIA),该攻击利用文本到图像合成的最新进展,在黑盒设置下推断单个类别的敏感属性,同时与相关白盒攻击保持竞争性。我们在人脸识别领域的大规模实验表明,CAIA能够准确推断未公开的敏感属性,如个体的发色、性别和种族外貌,这些信息并非训练标签的一部分。有趣的是,我们证明对抗鲁棒模型比标准模型更容易受此类隐私泄露影响,这表明鲁棒性与隐私之间存在权衡关系。