Large-scale online deanonymization with LLMs

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

翻译：我们证明大型语言模型可用于执行大规模去匿名化。在拥有完整互联网访问权限的情况下，我们的智能体仅凭匿名在线资料和对话，就能以高精度重新识别Hacker News用户和Anthropic Interviewer参与者，其效果相当于专业人类调查员数小时的工作成果。随后我们针对封闭环境设计了攻击方案。给定两个匿名个体数据库，每个数据库包含由该个体撰写或关于该个体的非结构化文本，我们实现了一个可扩展的攻击流程，利用LLM执行以下操作：(1) 提取身份相关特征，(2)通过语义嵌入搜索候选匹配项，(3)对候选匹配进行推理验证以降低误报率。相较于先前需要结构化数据或人工特征工程的去匿名化研究（如Netflix竞赛），我们的方法可直接处理跨任意平台的原始用户内容。我们构建了三个包含已知真实数据的数据集来评估攻击效果：首个数据集通过资料中出现的跨平台引用，将Hacker News账户与LinkedIn档案进行关联；第二个数据集匹配Reddit电影讨论社区中的用户；第三个数据集按时间分割单个用户的Reddit历史记录，创建两个待匹配的匿名档案。在每种设定下，基于LLM的方法均显著超越传统基线，在90%精确度下召回率可达68%，而最佳非LLM方法的召回率接近0%。我们的研究结果表明，保护匿名用户的"实际隐匿性"已不复存在，在线隐私的威胁模型需要重新评估。