Truthful Text Sanitization Guided by Inference Attacks

The purpose of text sanitization is to rewrite those text spans in a document that may directly or indirectly identify an individual, to ensure they no longer disclose personal information. Text sanitization must strike a balance between preventing the leakage of personal information (privacy protection) while also retaining as much of the document's original content as possible (utility preservation). We present an automated text sanitization strategy based on generalizations, which are more abstract (but still informative) terms that subsume the semantic content of the original text spans. The approach relies on instruction-tuned large language models (LLMs) and is divided into two stages. The LLM is first applied to obtain truth-preserving replacement candidates and rank them according to their abstraction level. Those candidates are then evaluated for their ability to protect privacy by conducting inference attacks with the LLM. Finally, the system selects the most informative replacement shown to be resistant to those attacks. As a consequence of this two-stage process, the chosen replacements effectively balance utility and privacy. We also present novel metrics to automatically evaluate these two aspects without the need to manually annotate data. Empirical results on the Text Anonymization Benchmark show that the proposed approach leads to enhanced utility, with only a marginal increase in the risk of re-identifying protected individuals compared to fully suppressing the original information. Furthermore, the selected replacements are shown to be more truth-preserving and abstractive than previous methods.

翻译：文本脱敏的目的是重写文档中可能直接或间接识别个人身份的文本片段，以确保其不再泄露个人信息。文本脱敏必须在防止个人信息泄露（隐私保护）与尽可能保留文档原始内容（效用保持）之间取得平衡。我们提出一种基于泛化操作的自动化文本脱敏策略，这些泛化操作是指能够涵盖原始文本片段语义内容的、更抽象（但仍具信息性）的术语。该方法依托于指令调优的大语言模型，分为两个阶段实施：首先运用大语言模型获取保真替换候选词，并依据其抽象程度进行排序；随后通过大语言模型实施推理攻击，评估这些候选词的隐私保护能力；最终系统选择那些被证明能抵抗此类攻击且信息量最大的替换方案。通过这种两阶段流程，所选替换方案能有效平衡效用与隐私。我们还提出了新的自动化评估指标，无需人工标注数据即可对这两个维度进行量化评估。在文本匿名化基准测试上的实证结果表明，相较于完全抑制原始信息的方法，所提方案在仅略微增加受保护个体再识别风险的同时，显著提升了文本效用。此外，实验证明所选替换方案比现有方法具有更好的保真性与抽象性。