大语言模型在医学教育中的潜力：基于资格考试的问答生成 (The Potential of LLMs in Medical Education: Generating Questions and Answers for Qualification Exams)

Recent research on large language models (LLMs) has primarily focused on their adaptation and application in specialized domains. The application of LLMs in the medical field is mainly concentrated on tasks such as the automation of medical report generation, summarization, diagnostic reasoning, and question-and-answer interactions between doctors and patients. The challenge of becoming a good teacher is more formidable than that of becoming a good student, and this study pioneers the application of LLMs in the field of medical education. In this work, we investigate the extent to which LLMs can generate medical qualification exam questions and corresponding answers based on few-shot prompts. Utilizing a real-world Chinese dataset of elderly chronic diseases, we tasked the LLMs with generating open-ended questions and answers based on a subset of sampled admission reports across eight widely used LLMs, including ERNIE 4, ChatGLM 4, Doubao, Hunyuan, Spark 4, Qwen, Llama 3, and Mistral. Furthermore, we engaged medical experts to manually evaluate these open-ended questions and answers across multiple dimensions. The study found that LLMs, after using few-shot prompts, can effectively mimic real-world medical qualification exam questions, whereas there is room for improvement in the correctness, evidence-based statements, and professionalism of the generated answers. Moreover, LLMs also demonstrate a decent level of ability to correct and rectify reference answers. Given the immense potential of artificial intelligence in the medical field, the task of generating questions and answers for medical qualification exams aimed at medical students, interns and residents can be a significant focus of future research.

翻译：近期关于大语言模型（LLMs）的研究主要集中于其在专业领域的适配与应用。LLMs在医学领域的应用主要集中在医疗报告生成自动化、摘要生成、诊断推理以及医患问答交互等任务上。成为良师比成为优生更具挑战性，本研究开创性地探索了LLMs在医学教育领域的应用。本工作中，我们探究了LLMs基于少量示例提示生成医学资格考试题目及对应答案的能力。利用真实世界的中文老年慢性病数据集，我们要求LLMs基于抽样入院报告的子集生成开放式问题与答案，测试涵盖八种广泛使用的LLMs，包括ERNIE 4、ChatGLM 4、豆包、混元、Spark 4、Qwen、Llama 3和Mistral。此外，我们邀请医学专家从多个维度对这些开放式问题与答案进行人工评估。研究发现，使用少量示例提示后，LLMs能有效模拟真实医学资格考试题目，而生成答案的正确性、循证陈述和专业性方面仍有提升空间。同时，LLMs在修正参考答案方面也展现出良好能力。鉴于人工智能在医学领域的巨大潜力，针对医学生、实习生和住院医师的医学资格考试问答生成任务，可成为未来研究的重要方向。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日