As LLMs rapidly advance and enter real-world use, their privacy implications are increasingly important. We study an authorship de-anonymization threat: using LLMs to link anonymous documents to their authors, potentially compromising settings such as double-blind peer review. We propose De-Anonymization at Scale (DAS), a large language model-based method for attributing authorship among tens of thousands of candidate texts. DAS uses a sequential progression strategy: it randomly partitions the candidate corpus into fixed-size groups, prompts an LLM to select the text most likely written by the same author as a query text, and iteratively re-queries the surviving candidates to produce a ranked top-k list. To make this practical at scale, DAS adds a dense-retrieval prefilter to shrink the search space and a majority-voting style aggregation over multiple independent runs to improve robustness and ranking precision. Experiments on anonymized review data show DAS can recover same-author texts from pools of tens of thousands with accuracy well above chance, demonstrating a realistic privacy risk for anonymous platforms. On standard authorship benchmarks (Enron emails and blog posts), DAS also improves both accuracy and scalability over prior approaches, highlighting a new LLM-enabled de-anonymization vulnerability.
翻译:随着大语言模型(LLM)的快速发展和实际应用,其隐私影响日益重要。我们研究了一种作者身份去匿名化威胁:利用LLM将匿名文档与其作者关联,可能危及双盲同行评审等场景。我们提出了大规模去匿名化(DAS),一种基于大语言模型的方法,可在数万份候选文本中进行作者归属。DAS采用顺序递进策略:将候选语料库随机划分为固定大小的组,提示LLM选择最可能由查询文本相同作者撰写的文本,并迭代地对幸存候选文本进行重新查询,以生成排序的top-k列表。为实现大规模实用化,DAS增加了密集检索预过滤器以缩小搜索空间,并通过多轮独立运行采用多数投票式聚合来提高鲁棒性和排序精度。在匿名评审数据上的实验表明,DAS能从数万份文本池中恢复同作者文本,准确率显著高于随机水平,证明了匿名平台面临的实际隐私风险。在标准作者归属基准测试(安然公司电子邮件和博客文章)中,DAS在准确性和可扩展性上也优于现有方法,凸显了LLM赋能的新型去匿名化漏洞。