Voice anonymization aims to conceal speaker identity and attributes while preserving intelligibility, but current evaluations rely almost exclusively on Equal Error Rate (EER) that obscures whether adversaries can mount high-precision attacks. We argue that privacy should instead be evaluated in the low false-positive rate (FPR) regime, where even a small number of successful identifications constitutes a meaningful breach. To this end, we introduce VoxGuard, a framework grounded in differential privacy and membership inference that formalizes two complementary notions: User Privacy, preventing speaker re-identification, and Attribute Privacy, protecting sensitive traits such as gender and accent. Across synthetic and real datasets, we find that informed adversaries, especially those using fine-tuned models and max-similarity scoring, achieve orders-of-magnitude stronger attacks at low-FPR despite similar EER. For attributes, we show that simple transparent attacks recover gender and accent with near-perfect accuracy even after anonymization. Our results demonstrate that EER substantially underestimates leakage, highlighting the need for low-FPR evaluation, and recommend VoxGuard as a benchmark for evaluating privacy leakage.
翻译:语音匿名化旨在隐藏说话者身份及属性特征的同时保持语音可懂度,但现有评估几乎完全依赖等错误率(EER),这掩盖了攻击者是否能够发起高精度攻击的问题。我们认为隐私评估应在低误报率(FPR)机制下进行,因为即使少量成功识别也构成实质性隐私泄露。为此,我们提出VoxGuard——一个基于差分隐私与成员推理的框架,该框架形式化定义了两种互补的隐私概念:用户隐私(防止说话者再识别)与属性隐私(保护性别、口音等敏感特征)。在合成与真实数据集上的实验表明,具备先验知识的攻击者(特别是使用微调模型和最大相似度评分的方法)在低FPR下能实现数量级更强的攻击效果,尽管其EER指标相近。对于属性隐私,我们证明即使经过匿名化处理,简单的透明攻击仍能以接近完美的准确率恢复性别和口音信息。我们的结果表明EER严重低估了隐私泄露风险,凸显了低FPR评估的必要性,并建议将VoxGuard作为评估隐私泄露的基准框架。