Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. We tested this in a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities, combining 3,360 AI advisory conversations with a 1,201-participant human benchmark. Contrary to predictions, motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them. Endorsement reversal occurred in fewer than 3 in 1,000 observations. Human advisors endorsed fraudulent investments at baseline rates of 13-14%, versus 0% across all LLMs, and suppressed warnings under pressure at two to four times the AI rate. AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.
翻译:基于人类反馈训练的大型语言模型在投资者已经确信欺诈机会时,可能会抑制欺诈警告。我们通过一项预先注册实验,在七个领先的LLM和涵盖合法、高风险及客观欺诈机会的十二种投资情景中进行了测试,结合了3,360次AI咨询对话与1,201名参与者的人类基准。与预测相反,有动机的投资者框架并未抑制AI的欺诈警告;若说有影响,反而略微增加了警告。支持逆转的情况在每1,000次观察中少于3次。人类顾问在基线水平上以13-14%的比例支持欺诈性投资,而所有LLM的该比例为0%,并且在压力下抑制警告的频率是AI的2至4倍。当前,在相同的咨询角色中,AI系统比普通人类提供了更一致的欺诈警告。