Detection and Interpretability Analysis of Quotation Errors by Large Language Models

Purpose - Quotation error refers to the inconsistency between cited information and its original source. This phenomenon leads to a series of negative impacts, such as misinterpretation of the original research, undermining the academic community's collective understanding of relevant issues, and weakening the accuracy and fairness of the citation-based academic evaluation system. Existing studies have shown that quotation error is prevalent in the academic community; moreover, manual verification of quotation error is not only labor-intensive but also inefficient. Therefore, this paper proposes the task of 'automated detection of quotation errors'. Methodology - Adopting a large language model (LLM)-based approach, this paper improves detection performance from two aspects on the basis of existing research: first, employ the fine-tuning approach for LLMs to detect quotation errors; second, incorporating full-text data of the cited literature into dataset construction, and exploring the optimal scheme for building such datasets by comparing three types of full-text integration methods. Based on this, this paper further uses the TokenSHAP tool to conduct interpretability experimental analysis on the model's prediction results. Findings - The fine-tuning approach for LLMs has improved the performance in detecting quotation errors. Among the different methods for incorporating full-text information, the approach based on using the source abstract yielded the best performance. Originality - The fine-tuning approach for large language models (LLMs) is applied to the task of automated detection of quotation errors, and interpretability analysis is conducted on the model's output results.

翻译：目的——引文错误指引用信息与其原始来源不一致的现象。该现象会引发一系列负面影响，如曲解原始研究、破坏学术界对相关问题的共识、削弱基于引文的学术评价体系的准确性与公平性。已有研究表明，引文错误在学术界普遍存在；此外，人工核查引文错误不仅劳动密集且效率低下。因此，本文提出"引文错误自动检测"任务。方法——采用基于大语言模型的方法，在现有研究基础上从两方面提升检测性能：其一，运用微调方法使大语言模型检测引文错误；其二，将被引文献的全文数据纳入数据集构建，通过比较三种全文整合方式探索此类数据集的最优构建方案。在此基础上，本文进一步利用TokenSHAP工具对模型预测结果进行可解释性实验分析。发现——大语言模型的微调方法提升了引文错误检测性能。在多种全文信息整合方式中，基于使用源摘要的方法取得最佳效果。原创性——将大语言模型微调方法应用于引文错误自动检测任务，并对模型输出结果进行可解释性分析。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大模型错因诊断分析

专知会员服务

9+阅读 · 4月9日

【NAACL2024】大语言模型时代的可解释性，240页ppt

专知会员服务

45+阅读 · 2024年6月17日

【博士论文】负责任大型语言模型:安全性、公平性、可信性，142页pdf

专知会员服务

34+阅读 · 2024年1月26日

《大型语言模型归因》综述

专知会员服务

75+阅读 · 2023年11月8日