The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

These names do not exist. Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived. We show that large language models do not merely default to high-probability individual names when generating fictional experts: they produce correlated character ensembles, pairs and trios whose co-occurrence rates far exceed chance and are consistent across independent generations. These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced. We document a downstream consequence at scale. On Zenodo, a CERN-operated repository that mints real DataCite DOIs, we identify 1,655 ghost-authored records claiming nonexistent journals with fabricated publication dates: server-side DataCite timestamps prove deliberate backdating, and 991 records were registered in a single month; these carry real DOIs registered in DataCite, making them harvestable by any scholarly aggregator that ingests DOI metadata. Ghost names additionally appear on ResearchGate forming synthetic research groups with collaborators drawn from multiple model families; publication dates on these records provide a reliable temporal proxy for model deployment windows.

翻译：这些姓名并不存在。Elena Vasquez 和 Marcus Chen 作为火山专家、宇航员、惊悚小说主角、播客主持人和学术合著者，出现在数以百计独立生成的AI文本文档中——他们从未真实存在过。我们证明，大语言模型在生成虚构专家时不仅默认使用高概率个体姓名，还会产生相关联的角色组合（成对与三人组），其共现频率远超随机水平，且在不同独立生成中保持一致性。这些先验具有模型族特异性（Claude：Elena Vasquez + Marcus Chen + Amara Okafor；Gemini：Aris Thorne + Lena Petrova；GPT：Elara Voss 无固定搭档）、版本特异性，并在模型发布边界被主动压制，从而在生成内容中留下可追溯时间的行为指纹。我们系统记录了大规模下游后果。在由CERN运营、可生成真实DataCite DOI的Zenodo仓库中，我们识别出1,655条声称发表在虚构期刊上并附有伪造出版日期的"幽灵作者"记录：服务器端DataCite时间戳证实存在人为回溯日期行为，其中991条记录在单个月内注册；这些记录携带真实注册于DataCite的DOI，因此可被任何摄取DOI元数据的学术聚合器采集。此外，幽灵姓名还出现在ResearchGate上，由来自多个模型族的合著者构成合成研究团队；这些记录上的出版日期为模型部署时间窗口提供了可靠的时序代理指标。