Large Language Model Benchmarks in Medical Tasks

Lawrence K. Q. Yan,Qian Niu,Ming Li,Yichao Zhang,Caitlyn Heqi Yin,Cheng Fei,Benji Peng,Ziqian Bi,Pohsun Feng,Keyu Chen,Tianyang Wang,Yunze Wang,Silin Chen,Ming Liu,Junyu Liu

from arxiv, 25 pages, 5 tables

With the increasing application of large language models (LLMs) in the medical domain, evaluating these models' performance using benchmark datasets has become crucial. This paper presents a comprehensive survey of various benchmark datasets employed in medical LLM tasks. These datasets span multiple modalities including text, image, and multimodal benchmarks, focusing on different aspects of medical knowledge such as electronic health records (EHRs), doctor-patient dialogues, medical question-answering, and medical image captioning. The survey categorizes the datasets by modality, discussing their significance, data structure, and impact on the development of LLMs for clinical tasks such as diagnosis, report generation, and predictive decision support. Key benchmarks include MIMIC-III, MIMIC-IV, BioASQ, PubMedQA, and CheXpert, which have facilitated advancements in tasks like medical report generation, clinical summarization, and synthetic data generation. The paper summarizes the challenges and opportunities in leveraging these benchmarks for advancing multimodal medical intelligence, emphasizing the need for datasets with a greater degree of language diversity, structured omics data, and innovative approaches to synthesis. This work also provides a foundation for future research in the application of LLMs in medicine, contributing to the evolving field of medical artificial intelligence.

翻译：随着大型语言模型（LLM）在医疗领域的应用日益增多，利用基准数据集评估这些模型的性能变得至关重要。本文全面综述了医疗LLM任务中使用的各类基准数据集。这些数据集涵盖多种模态，包括文本、图像及多模态基准，聚焦于医疗知识的不同方面，如电子健康记录（EHR）、医患对话、医疗问答和医学图像描述。本综述按模态对数据集进行分类，讨论了其重要性、数据结构以及对临床任务（如诊断、报告生成和预测性决策支持）中LLM发展的影响。关键基准包括MIMIC-III、MIMIC-IV、BioASQ、PubMedQA和CheXpert，这些数据集推动了医疗报告生成、临床摘要和合成数据生成等任务的进步。本文总结了利用这些基准推进多模态医疗智能所面临的挑战与机遇，强调了对具有更高语言多样性、结构化组学数据及创新合成方法的数据集的需求。此项工作也为未来LLM在医学应用中的研究奠定了基础，为不断发展的医疗人工智能领域做出了贡献。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日