Large Language Models (LLMs) can be helpful for literature search and summarisation, but retracted articles can confuse them. This article asks three open weights (offline) LLMs whether 161 high profile retracted articles had been retracted, performing a similar check for a benchmark multidisciplinary set of 34,070 non-retracted articles. Based on titles and abstracts, in over 80% of cases the LLMs claimed that a retracted article had not been retracted (GPT OSS 120B: 82%; Gemma 3 27B: 84%; DeepSeek R1 72B: 88%). The reasons given for a correct retraction declaration were often wrong, even if detailed. This confirms that LLMs have little ability to distinguish between valid and retracted studies, unless they are allowed to, and do, check online. For the benchmark test, there were only 55 false retraction claims from 34,070 non-retracted full text articles, and 28 false claims when only the title and abstract were entered, suggesting that there is only a small chance that LLMs discount valid studies. When retractions are erroneously claimed, this does not seem to be due to mistakes in the article. Overall, the results give new reasons to be cautious about LLM claims about academic findings.
翻译:大型语言模型(LLM)可辅助文献检索与摘要总结,但撤回文章可能对其造成混淆。本文向三个开源(离线)LLM询问161篇高知名度撤回文章是否已被撤回,同时对包含34,070篇未撤回文章的多学科基准数据集进行类似验证。基于标题和摘要,超过80%的情况下,LLM声称撤回文章未被撤回(GPT OSS 120B: 82%;Gemma 3 27B: 84%;DeepSeek R1 72B: 88%)。即使撤回声明正确,其给出理由也常存在错误,即便理由看似详尽。这证实LLM区分有效研究与撤回研究的能力极为有限,除非允许并实际联网核查。在基准测试中,34,070篇未撤回全文文章中仅出现55例错误撤回声明,仅输入标题与摘要时出现28例错误声明,表明LLM否定有效研究的概率较低。当错误声称撤回时,似乎并非由文章自身错误导致。总体而言,研究结果提供了新理由,需对LLM关于学术成果的声明保持审慎态度。