Differentiate ChatGPT-generated and Human-written Medical Texts

Background: Large language models such as ChatGPT are capable of generating grammatically perfect and human-like text content, and a large number of ChatGPT-generated texts have appeared on the Internet. However, medical texts such as clinical notes and diagnoses require rigorous validation, and erroneous medical content generated by ChatGPT could potentially lead to disinformation that poses significant harm to healthcare and the general public. Objective: This research is among the first studies on responsible and ethical AIGC (Artificial Intelligence Generated Content) in medicine. We focus on analyzing the differences between medical texts written by human experts and generated by ChatGPT, and designing machine learning workflows to effectively detect and differentiate medical texts generated by ChatGPT. Methods: We first construct a suite of datasets containing medical texts written by human experts and generated by ChatGPT. In the next step, we analyze the linguistic features of these two types of content and uncover differences in vocabulary, part-of-speech, dependency, sentiment, perplexity, etc. Finally, we design and implement machine learning methods to detect medical text generated by ChatGPT. Results: Medical texts written by humans are more concrete, more diverse, and typically contain more useful information, while medical texts generated by ChatGPT pay more attention to fluency and logic, and usually express general terminologies rather than effective information specific to the context of the problem. A BERT-based model can effectively detect medical texts generated by ChatGPT, and the F1 exceeds 95%.

翻译：背景：诸如ChatGPT等大型语言模型能够生成语法完美、类人的文本内容，且互联网上已出现大量ChatGPT生成的文本。然而，临床记录、诊断等医学文本需经过严格验证，ChatGPT生成的错误医学内容可能引发虚假信息，对医疗体系及公众健康造成严重危害。目标：本研究是医学领域关于负责任且合乎伦理的人工智能生成内容（AIGC）的早期探索之一。我们重点分析人类专家撰写的医学文本与ChatGPT生成文本的差异，并设计机器学习工作流以有效检测和区分ChatGPT生成的医学文本。方法：首先构建包含人类专家撰写及ChatGPT生成的医学文本数据集套件。随后分析两类内容的语言学特征，揭示其在词汇、词性、依存关系、情感、困惑度等方面的差异。最后设计并实现检测ChatGPT生成医学文本的机器学习方法。结果：人类撰写的医学文本更具体、更多样化，通常包含更多有效信息；而ChatGPT生成的医学文本更注重流畅性与逻辑性，常表达通用术语而非面向问题语境的有效信息。基于BERT的模型能有效检测ChatGPT生成的医学文本，F1值超过95%。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日