The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians, and it is typically written by radiologists based on the 'Findings' section. However, writing numerous impressions can be laborious and error-prone for radiologists. Although recent studies have achieved promising results in automatic impression generation using large-scale medical text data for pre-training and fine-tuning pre-trained language models, such models often require substantial amounts of medical text data and have poor generalization performance. While large language models (LLMs) like ChatGPT have shown strong generalization capabilities and performance, their performance in specific domains, such as radiology, remains under-investigated and potentially limited. To address this limitation, we propose ImpressionGPT, which leverages the in-context learning capability of LLMs by constructing dynamic contexts using domain-specific, individualized data. This dynamic prompt approach enables the model to learn contextual knowledge from semantically similar examples from existing data. Additionally, we design an iterative optimization algorithm that performs automatic evaluation on the generated impression results and composes the corresponding instruction prompts to further optimize the model. The proposed ImpressionGPT model achieves state-of-the-art performance on both MIMIC-CXR and OpenI datasets without requiring additional training data or fine-tuning the LLMs. This work presents a paradigm for localizing LLMs that can be applied in a wide range of similar application scenarios, bridging the gap between general-purpose LLMs and the specific language processing needs of various domains.
翻译:放射学报告中的“印象”部分是放射科医生与其他医生沟通的关键依据,通常由放射科医生基于“发现”部分撰写。然而,为放射科医生撰写大量印象报告既耗时又易出错。尽管近期研究通过大规模医学文本数据对预训练语言模型进行预训练和微调,在自动生成印象方面取得了令人瞩目的成果,但此类模型通常需要大量医学文本数据,且泛化性能较差。虽然ChatGPT等大语言模型展现出较强的泛化能力和性能,但其在放射学等特定领域的应用效果仍待深入研究,且可能存在局限。为解决这一问题,我们提出ImpressionGPT,该模型利用大语言模型的上下文学习能力,通过构建基于领域特异性个体化数据的动态上下文,使模型能够从现有数据中语义相似的示例中学习上下文知识。此外,我们设计了一种迭代优化算法,该算法对生成的印象结果进行自动评估,并生成相应的指令提示以进一步优化模型。所提出的ImpressionGPT模型在MIMIC-CXR和OpenI数据集上均达到最优性能,且无需额外训练数据或对大语言模型进行微调。本研究提出了一种可适用于多种类似应用场景的大语言模型本地化范式,弥合了通用大语言模型与各领域特定语言处理需求之间的差距。