Generative Pre-Training (GPT) models like ChatGPT have demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. Although ChatGPT has been integrated into the overall workflow to boost efficiency in many domains, the lack of flexibility in the finetuning process hinders its applications in areas that demand extensive domain expertise and semantic knowledge, such as healthcare. In this paper, we evaluate ChatGPT on the China National Medical Licensing Examination (CNMLE) and propose a novel approach to improve ChatGPT from two perspectives: integrating medical domain knowledge and enabling few-shot learning. By using a simple but effective retrieval method, medical background knowledge is extracted as semantic instructions to guide the inference of ChatGPT. Similarly, relevant medical questions are identified and fed as demonstrations to ChatGPT. Experimental results show that directly applying ChatGPT fails to qualify the CNMLE at a score of 51 (i.e., only 51\% of questions are answered correctly). While our knowledge-enhanced model achieves a high score of 70 on CNMLE-2022 which not only passes the qualification but also surpasses the average score of humans (61). This research demonstrates the potential of knowledge-enhanced ChatGPT to serve as versatile medical assistants, capable of analyzing real-world medical problems in a more accessible, user-friendly, and adaptable manner.
翻译:像ChatGPT这样的生成式预训练(GPT)模型在各种自然语言处理(NLP)任务中展现了卓越的性能。尽管ChatGPT已融入许多领域的工作流程以提升效率,但其微调过程缺乏灵活性,限制了其在医疗等需要广泛领域专业知识和语义知识的场景中的应用。本文评估了ChatGPT在《中国医师资格考试》(CNMLE)上的表现,并提出了一种从两个角度改进ChatGPT的新方法:整合医学领域知识和实现少样本学习。通过一种简单但有效的检索方法,提取医学背景知识作为语义指令来指导ChatGPT的推理过程。同样,识别相关医学问题并将其作为示例输入ChatGPT。实验结果表明,直接应用ChatGPT无法通过CNMLE,得分仅为51(即仅正确回答51%的问题)。而我们的知识增强模型在2022年CNMLE上获得了70分的高分,不仅通过了资格考试,还超过了人类平均分(61分)。本研究展示了知识增强型ChatGPT作为多功能医疗助手的潜力,能够以更易获取、更用户友好且更具适应性的方式分析真实世界的医疗问题。