The widespread adoption of face recognition (FR) technologies raises serious privacy concerns, as facial data can be exploited without consent. To address this challenge, we propose Adv-TGD, a generative adversarial attack framework that synthesizes photorealistic faces capable of impersonating target identities and deceiving face recognition systems. Built upon Stable Diffusion, Adv-TGD performs per-sample LoRA fine-tuning conditioned on concise textual prompts to generate natural yet adversarially manipulated identities. Unlike conventional identity-attack approaches, our method optimizes lightweight cross-attention adapters for each source-target pair within a single-step denoising process. Latent blending is constrained by a face-local heatmap mask to ensure spatially precise identity manipulation while preserving non-sensitive regions. We introduce a composite objective that integrates masked epsilon-MSE reconstruction, thresholded identity divergence in FR embedding space, directional feature alignment, and source-similarity suppression to balance adversarial attack and visual realism. Optionally, LLaVA-generated attribute prompts enhance fine-grained semantic details without reintroducing identity cues. Under the black-box evaluation protocol, Adv-TGD attains an average attack success rate (ASR) of 85.90% across IR152, IRSE50, MobileFace, and FaceNet, surpassing the semantic SOTA baseline Adv-CPG by +6.25 points, diffusion-based makeup method DiffAIM by +3 points, and noise-based P3-Mask by +16 points. Despite its strong attack efficacy, Adv-TGD preserves high visual fidelity (PSNR = 27.15 dB, SSIM = 0.981). Furthermore, we demonstrate the flexibility of our framework by successfully extending it to in-the-wild datasets (LADN), general object classification (ImageNet), and transformer-based diffusion models (FLUX.1).
翻译:人脸识别(FR)技术的广泛应用引发了严重的隐私问题,因为人脸数据可能未经同意而被利用。为应对这一挑战,我们提出Adv-TGD——一种生成式对抗攻击框架,能够合成可冒充目标身份并欺骗人脸识别系统的逼真面孔。基于Stable Diffusion,Adv-TGD在简洁文本提示条件下执行逐样本LoRA微调,以生成自然且具有对抗性操控的身份。与传统的身份攻击方法不同,我们的方法在单步去噪过程中,为每个源-目标对优化轻量级交叉注意力适配器。潜在混合由人脸局部热力图掩码约束,以确保精确的身份操控而保留非敏感区域。我们引入一个复合目标函数,该函数整合了掩码epsilon-MSE重建、FR嵌入空间中的阈值化身份散度、方向特征对齐以及源相似性抑制,以平衡对抗攻击与视觉真实性。可选地,由LLaVA生成的属性提示可增强细粒度语义细节,同时避免重新引入身份线索。在黑盒评估协议下,Adv-TGD在IR152、IRSE50、MobileFace和FaceNet上平均攻击成功率(ASR)达85.90%,分别超越语义SOTA基线Adv-CPG(+6.25点)、基于扩散的化妆方法DiffAIM(+3点)以及基于噪声的P3-Mask(+16点)。尽管具有强大攻击效果,Adv-TGD仍保持高视觉保真度(PSNR = 27.15 dB,SSIM = 0.981)。此外,我们成功将框架扩展至野外数据集(LADN)、通用物体分类(ImageNet)及基于Transformer的扩散模型(FLUX.1),展示了其灵活性。