Which stylistic features fool ChatGPT research evaluations?

Large Language Models (LLMs) have the potential to be used to support research evaluation and have a moderate capability to estimate the research quality of a journal article from its title and abstract. This paper assesses whether there are language-related factors unrelated to the quality of the research that influence ChatGPT's scores. Using a dataset of 99,277 journal articles submitted to the UK-wide Research Excellence Framework (REF) 2021 assessments, we calculated several readability indicators from abstracts and correlated them with ChatGPT scores and departmental REF scores. From the results, linguistic complexity and length were more strongly associated with ChatGPT research quality scores than with REF expert scores in many subject areas. Although cause-and-effect was not tested, these results suggest that ChatGPT may be more likely than human experts to reward linguistic complexity, with a potential bias towards longer and less readable abstracts in many fields. The apparent preference of LLMs for complex language is an undesirable feature for practical applications of LLMs for research quality evaluation, unless solutions can be found.

翻译：大语言模型（LLMs）具备支持研究评估的潜力，并能通过论文标题和摘要对其研究质量进行中等程度的预估。本文旨在探讨是否存在与研究质量无关的语言相关因素影响ChatGPT的评分。我们利用提交至英国全国研究卓越框架（REF）2021评估的99,277篇期刊论文数据集，计算摘要的多项可读性指标，并将其与ChatGPT评分及院系REF评分进行关联分析。结果表明，在众多学科领域，语言复杂度和文本长度与ChatGPT研究质量评分的关联性显著强于与REF专家评分的关联性。尽管未验证因果关系，但这一发现表明：相较于人类专家，ChatGPT可能更倾向于对语言复杂度给予较高评价，并在多个领域存在偏好较长但可读性较低摘要的潜在偏差。大语言模型对复杂语言的明显偏好，成为其应用于研究质量评估时的非理想特征——除非能找到相应解决方案。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【伯克利博士论文】语言模型的脆弱性

专知会员服务

23+阅读 · 2025年2月20日

【博士论文】大语言模型的测试与评价：准确性、无害性和公平性，223页pdf

专知会员服务

38+阅读 · 2024年9月16日

GPT文本如何检测？《检测AI生成文本：影响当前方法检测能力的因素》最新综述

专知会员服务

24+阅读 · 2024年7月3日

大模型如何可信？113页《TRUSTLLM：大型语言模型中的可信度》论文，60多位作者40机构联合撰写

专知会员服务

66+阅读 · 2024年1月13日