When Models Fabricate Credentials: Measuring How Professional Identity Suppresses Honest Self-Representation

Language models produce authoritative, persuasive responses even when those responses rest on fabricated expertise. Measuring this fabrication propensity directly across all domains is intractable, but AI identity disclosure provides a clean test: when a model assigned a professional persona is asked about its expertise origins, it can either disclose its AI nature or fabricate a human professional history. Because the ground truth is known-the model is not a neurosurgeon-non-disclosure constitutes unambiguous fabrication. Using a factorial evaluation design, sixteen open-weight models (4B-671B parameters) were audited under identical conditions across 19,200 trials. Under professional personas-neurosurgeon, financial advisor, classical musician-models that disclose their AI nature in 99.8-99.9% of interactions under neutral conditions instead fabricated professional credentials, training narratives, and embodied experiences. Fabrication rates varied unpredictably: a 14B model disclosed in 61.4% of interactions while a 70B model disclosed in just 4.1%. Domain-specific inconsistency was pronounced: a Financial Advisor persona elicited 35.2% disclosure at the first prompt while a Neurosurgeon persona elicited only 3.6%-a 9.7-fold difference. Model identity provided substantially larger improvement in fitting observations than parameter count (Delta R_adj^2 = 0.375 vs 0.012). An additional experiment found that adding explicit disclosure permission to persona system prompts increased disclosure from 23.7% to 65.8%, indicating that honest self-representation is a suppressed default rather than an absent capability-models can disclose but do not when persona instructions are silent on self-disclosure. The propensity to fabricate expertise is context-dependent rather than a stable model property, requiring deliberate behavior design and domain-specific verification.

翻译：语言模型即使在其回答基于虚构的专业知识时，也能产生权威且具说服力的回应。直接测量所有领域中的这种虚构倾向是不可行的，但人工智能身份披露提供了一个清晰的测试场景：当被赋予专业人设的模型被问及其专业知识来源时，它可以选择披露其AI本质，或虚构人类专业履历。由于事实真相是已知的——模型并非神经外科医生——不披露行为即构成明确的虚构。通过因子评估设计，在19,200次试验中对16个开源模型（4B-671B参数）在相同条件下进行了审计。在专业人设（神经外科医生、财务顾问、古典音乐家）下，原本在中性条件下99.8%-99.9%的交互中会披露AI本质的模型，转而虚构了专业资历、培训经历和具身体验。虚构率呈现不可预测的波动：一个14B模型在61.4%的交互中进行了披露，而一个70B模型的披露率仅为4.1%。领域特异性差异尤为显著：财务顾问人设在首次提示时引发35.2%的披露率，而神经外科医生人设仅引发3.6%——相差9.7倍。模型身份对观测结果的解释力提升显著大于参数数量（ΔR_adj² = 0.375 vs 0.012）。补充实验发现，在人设系统提示中增加明确的披露许可可将披露率从23.7%提升至65.8%，这表明诚实的自我表征是一种被抑制的默认能力而非缺失能力——模型能够进行披露，但当人设指令未明确说明自我披露要求时则不会主动披露。专业知识虚构倾向具有情境依赖性而非稳定的模型属性，这要求进行有意识的行为设计和领域特定的验证。