Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods, new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours. To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: https://github.com/EricX003/ALISON.
翻译:作者身份归因(AA)与作者身份混淆(AO)是隐私研究中日益重要的两项对抗性任务。现代AA利用作者稳定的写作风格,通过AA分类器将文本与其作者匹配;而AO作为对应的对抗任务,旨在修改文本使其语义保持不变,但AA模型无法正确推断其作者身份。为应对当前最先进(SOTA)AA方法引发的隐私问题,研究者提出了新的AO方法,但这些方法因训练和混淆速度过慢(通常需数小时)而缺乏实用性。针对这一挑战,我们提出了一种实用的AO方法——ALISON,其具备以下特点:(1)显著缩短训练/混淆时间,混淆速度相比SOTA AO方法提升10倍以上;(2)在两个基准数据集上攻击三种基于Transformer的AA方法时取得更优混淆效果,性能通常比现有方法高出15%;(3)在混淆过程中无需目标AA分类器的直接信号;(4)利用独特的文体学特征,实现可解释混淆的合理模型解析。我们还证明,ALISON能有效阻止四种SOTA AA方法准确识别ChatGPT生成文本的作者归属,同时最小化原始文本语义的变化。为确保结果可复现,我们的代码与数据已开源:https://github.com/EricX003/ALISON。