A significant and growing number of published scientific articles is found to involve fraudulent practices, posing a serious threat to the credibility and safety of research in fields such as medicine. We propose Pub-Guard-LLM, the first large language model-based system tailored to fraud detection of biomedical scientific articles. We provide three application modes for deploying Pub-Guard-LLM: vanilla reasoning, retrieval-augmented generation, and multi-agent debate. Each mode allows for textual explanations of predictions. To assess the performance of our system, we introduce an open-source benchmark, PubMed Retraction, comprising over 11K real-world biomedical articles, including metadata and retraction labels. We show that, across all modes, Pub-Guard-LLM consistently surpasses the performance of various baselines and provides more reliable explanations, namely explanations which are deemed more relevant and coherent than those generated by the baselines when evaluated by multiple assessment methods. By enhancing both detection performance and explainability in scientific fraud detection, Pub-Guard-LLM contributes to safeguarding research integrity with a novel, effective, open-source tool.
翻译:已发表科学文章中被发现涉及欺诈行为的数量显著且持续增长,这对医学等领域研究的可信度与安全性构成了严重威胁。我们提出了Pub-Guard-LLM,这是首个专为生物医学科学文章欺诈检测定制的大型语言模型系统。我们为部署Pub-Guard-LLM提供了三种应用模式:基础推理、检索增强生成和多智能体辩论。每种模式均可为预测提供文本解释。为了评估我们系统的性能,我们引入了一个开源基准数据集PubMed Retraction,该数据集包含超过11K篇真实世界的生物医学文章,涵盖元数据和撤稿标签。我们证明,在所有模式下,Pub-Guard-LLM始终超越各种基线模型的性能,并能提供更可靠的解释——即通过多种评估方法衡量,这些解释被认为比基线模型生成的解释更具相关性和连贯性。通过在科学欺诈检测中同时提升检测性能和可解释性,Pub-Guard-LLM以一种新颖、有效、开源的工具为保障研究诚信做出了贡献。