Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works emerged to address this issue using deep learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion/interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, training strategies, public datasets, evaluation methods, current challenges, and future directions in this field are summarized. We have also conducted a quantitative comparison between different methods under the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and assist them in developing new algorithms to advance the field.
翻译:自动放射学报告生成能够减轻医生工作负担并缩小医疗资源区域差异,因此成为医学图像分析领域的重要课题。这是一项具有挑战性的任务,因为计算模型需模拟医生从多模态输入数据(如医学影像、临床信息、医学知识等)中获取信息,并生成全面准确的报告。近年来,基于深度学习的多种方法(如Transformer、对比学习、知识库构建等)已涌现以解决该问题。本综述总结了最新研究中的关键技术,提出了包含五个主要组件的深度学习报告生成通用工作流程:多模态数据获取、数据预处理、特征学习、特征融合/交互及报告生成。对每个组件中的先进方法进行了重点阐述。此外,本文系统归纳了训练策略、公开数据集、评估方法、当前挑战及未来研究方向,并在统一实验设置下对不同方法进行了定量比较。本综述作为最前沿的专题研究,重点关注放射学报告生成的多模态输入与数据融合,旨在为从事自动临床报告生成与医学图像分析(尤其是多模态输入相关研究)的学者提供全面丰富的参考信息,助力其开发推动领域发展的新算法。