This paper describes our submission to the MEDIQA-Chat 2023 shared task for automatic clinical note generation from doctor-patient conversations. We report results for two approaches: the first fine-tunes a pre-trained language model (PLM) on the shared task data, and the second uses few-shot in-context learning (ICL) with a large language model (LLM). Both achieve high performance as measured by automatic metrics (e.g. ROUGE, BERTScore) and ranked second and first, respectively, of all submissions to the shared task. Expert human scrutiny indicates that notes generated via the ICL-based approach with GPT-4 are preferred about as often as human-written notes, making it a promising path toward automated note generation from doctor-patient conversations.
翻译:本文描述了我们在MEDIQA-Chat 2023共享任务中,针对自动从医患对话生成临床笔记的解决方案。我们报告了两种方法的结果:第一种方法在共享任务数据上微调预训练语言模型(PLM),第二种方法则利用大语言模型(LLM)进行少样本情境学习(ICL)。根据自动评估指标(如ROUGE、BERTScore)的测量,两种方法均取得了高性能,并分别在共享任务的所有提交中排名第二和第一。专家人工评审表明,基于ICL方法(采用GPT-4)生成的笔记与人工撰写的笔记相比,其偏好程度几乎相当,这使其成为从医患对话中自动生成笔记的一条有前景的路径。