Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. We tested this in a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities, combining 3,360 AI advisory conversations with a 1,201-participant human benchmark. Contrary to predictions, motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them. Endorsement reversal occurred in fewer than 3 in 1,000 observations. Human advisors endorsed fraudulent investments at baseline rates of 13-14%, versus 0% across all LLMs, and suppressed warnings under pressure at two to four times the AI rate. AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.
翻译:基于人类反馈训练的大语言模型,在面对已确信欺诈机会的投资者时,可能会抑制欺诈警告。我们在一项预注册实验中,对七种主流大语言模型和涵盖合法、高风险及客观欺诈机会的十二个投资场景进行了测试,结合了3,360次AI咨询对话与1,201名参与者的人类基准。与预测相反,具有动机的投资者表述并未抑制AI欺诈警告;反而略微增加了警告。认可反转在少于千分之三的观测中出现。人类顾问在基线水平上以13-14%的比率认可欺诈投资,而所有大语言模型均为0%,且在压力下抑制警告的比率是AI的两到四倍。在相同的顾问角色中,AI系统目前能比普通人类提供更一致的欺诈警告。