Text sanitization, which employs differential privacy to replace sensitive tokens with new ones, represents a significant technique for privacy protection. Typically, its performance in preserving privacy is evaluated by measuring the attack success rate (ASR) of reconstruction attacks, where attackers attempt to recover the original tokens from the sanitized ones. However, current reconstruction attacks on text sanitization are developed empirically, making it challenging to accurately assess the effectiveness of sanitization. In this paper, we aim to provide a more accurate evaluation of sanitization effectiveness. Inspired by the works of Palamidessi et al., we implement theoretically optimal reconstruction attacks targeting text sanitization. We derive their bounds on ASR as benchmarks for evaluating sanitization performance. For real-world applications, we propose two practical reconstruction attacks based on these theoretical findings. Our experimental results underscore the necessity of reassessing these overlooked risks. Notably, one of our attacks achieves a 46.4% improvement in ASR over the state-of-the-art baseline, with a privacy budget of epsilon=4.0 on the SST-2 dataset. Our code is available at: https://github.com/mengtong0110/On-the-Vulnerability-of-Text-Sanitization.
翻译:文本净化是一种采用差分隐私技术将敏感词元替换为新词元的隐私保护重要方法。通常,其隐私保护性能通过衡量重构攻击的攻击成功率(ASR)来评估,即攻击者试图从净化后的文本中恢复原始词元。然而,当前针对文本净化的重构攻击多基于经验方法开发,难以准确评估净化机制的实际有效性。本文旨在提供更精确的净化效果评估方法。受Palamidessi等人研究的启发,我们实现了针对文本净化的理论最优重构攻击,并推导出其ASR上界作为评估净化性能的基准。针对实际应用场景,我们基于理论发现提出了两种实用的重构攻击方法。实验结果凸显了重新评估这些被忽视风险的必要性。值得注意的是,在SST-2数据集上(隐私预算ε=4.0时),我们提出的攻击方法相比现有最优基线实现了46.4%的ASR提升。代码已开源:https://github.com/mengtong0110/On-the-Vulnerability-of-Text-Sanitization。