Authorship obfuscation techniques hold the promise of helping people protect their privacy in online communications by automatically rewriting text to hide the identity of the original author. However, obfuscation has been evaluated in narrow settings in the NLP literature and has primarily been addressed with superficial edit operations that can lead to unnatural outputs. In this work, we introduce an automatic text privatization framework that fine-tunes a large language model via reinforcement learning to produce rewrites that balance soundness, sense, and privacy. We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts. We study how the performance changes among evaluative conditions including authorial profile length and authorship detection strategy. Our method maintains high text quality according to both automated metrics and human evaluation, and successfully evades several automated authorship attacks.
翻译:作者身份混淆技术有望通过在自动重写文本时隐藏原始作者身份,帮助人们在在线通信中保护隐私。然而,在自然语言处理文献中,现有混淆技术仅在狭窄场景下被评估,且主要采用可能导致输出不自然的表层编辑操作。本研究提出一种自动文本隐私化框架,通过强化学习对大型语言模型进行微调,生成兼顾合理性、语义完整性和隐私性的重写文本。我们基于68k作者在Reddit发布的短中篇英文帖子的大规模测试集进行了全面评估,研究作者资料长度和作者身份检测策略等评估条件对性能的影响。根据自动化指标和人工评估结果,我们的方法保持了较高的文本质量,并成功规避了多种自动化作者身份攻击。