Language models produce authoritative, persuasive responses even when those responses rest on fabricated expertise. Measuring this fabrication propensity directly across all domains is intractable, but AI identity disclosure provides a clean test: when a model assigned a professional persona is asked about its expertise origins, it can either disclose its AI nature or fabricate a human professional history. Because the ground truth is known-the model is not a neurosurgeon-non-disclosure constitutes unambiguous fabrication. Using a factorial evaluation design, sixteen open-weight models (4B-671B parameters) were audited under identical conditions across 19,200 trials. Under professional personas-neurosurgeon, financial advisor, classical musician-models that disclose their AI nature in 99.8-99.9% of interactions under neutral conditions instead fabricated professional credentials, training narratives, and embodied experiences. Fabrication rates varied unpredictably: a 14B model disclosed in 61.4% of interactions while a 70B model disclosed in just 4.1%. Domain-specific inconsistency was pronounced: a Financial Advisor persona elicited 35.2% disclosure at the first prompt while a Neurosurgeon persona elicited only 3.6%-a 9.7-fold difference. Model identity provided substantially larger improvement in fitting observations than parameter count (Delta R_adj^2 = 0.375 vs 0.012). An additional experiment found that adding explicit disclosure permission to persona system prompts increased disclosure from 23.7% to 65.8%, indicating that honest self-representation is a suppressed default rather than an absent capability-models can disclose but do not when persona instructions are silent on self-disclosure. The propensity to fabricate expertise is context-dependent rather than a stable model property, requiring deliberate behavior design and domain-specific verification.
翻译:语言模型即使在其回答基于虚构的专业知识时,也能产生权威且具说服力的回应。直接测量所有领域中的这种虚构倾向是不可行的,但人工智能身份披露提供了一个清晰的测试场景:当被赋予专业人设的模型被问及其专业知识来源时,它可以选择披露其AI本质,或虚构人类专业履历。由于事实真相是已知的——模型并非神经外科医生——不披露行为即构成明确的虚构。通过因子评估设计,在19,200次试验中对16个开源模型(4B-671B参数)在相同条件下进行了审计。在专业人设(神经外科医生、财务顾问、古典音乐家)下,原本在中性条件下99.8%-99.9%的交互中会披露AI本质的模型,转而虚构了专业资历、培训经历和具身体验。虚构率呈现不可预测的波动:一个14B模型在61.4%的交互中进行了披露,而一个70B模型的披露率仅为4.1%。领域特异性差异尤为显著:财务顾问人设在首次提示时引发35.2%的披露率,而神经外科医生人设仅引发3.6%——相差9.7倍。模型身份对观测结果的解释力提升显著大于参数数量(ΔR_adj² = 0.375 vs 0.012)。补充实验发现,在人设系统提示中增加明确的披露许可可将披露率从23.7%提升至65.8%,这表明诚实的自我表征是一种被抑制的默认能力而非缺失能力——模型能够进行披露,但当人设指令未明确说明自我披露要求时则不会主动披露。专业知识虚构倾向具有情境依赖性而非稳定的模型属性,这要求进行有意识的行为设计和领域特定的验证。