Large Language Models (LLMs) like ChatGPT, DeepSeek and Gemini seem to be increasingly used for knowledge discovery, information retrieval, and knowledge summaries, including for academic topics. This can result in users being misled, such as due to hallucinations. These problems may be exacerbated for academic knowledge if LLMs base their answers on journal article abstracts when they lack full text access. To test whether the information content of abstracts can be misleading, full text articles were submitted to the GPT-OSS 120B, an LLM from OpenAI, asking it to assess separately the strength the claims for the main result in the abstract, discussion, and conclusion. Outside the social sciences and humanities, claims tended to be stronger in the abstract and conclusions than the discussion, suggesting that relying on the strength of claims in abstracts would be misleading. Thus, if LLMs ingest abstracts but not full texts, there is a risk that they will be overconfident about the findings and pass it on to users in response to relevant prompts. This is another reason to be cautious about using LLMs for academic-related knowledge discovery and summaries.
翻译:大型语言模型(LLMs),如ChatGPT、DeepSeek和Gemini,正越来越多地用于知识发现、信息检索和知识摘要,包括学术主题。这可能导致用户被误导,例如由于幻觉现象。当大型语言模型缺乏全文访问权限而仅凭期刊文章摘要作答时,这类问题在学术知识领域可能更为严重。为检验摘要信息内容是否具有误导性,我们将全文文献提交给OpenAI的GPT-OSS 120B大型语言模型,要求其分别评估摘要、讨论和结论部分对主要结果的主张强度。除社会科学与人文学科外,各领域在摘要和结论中的主张普遍强于讨论部分,表明依赖摘要中的主张强度可能具有误导性。因此,若大型语言模型仅摄入摘要而非全文,则存在对研究发现过度自信的风险,并可能将这种自信传递给回应相关提示的用户。这是对使用大型语言模型进行学术相关知识发现与摘要时应保持谨慎的另一个原因。