Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.
翻译:开发通用基础模型近年来在医学人工智能领域引起了研究者的极大关注。这类模型开发的关键洞见在于对数据集规模的依赖,这凸显了开发包含不同成像模态下多样化监督信号的开源医学影像数据集的需求。本文提出RadGenome-Chest CT——基于CT-RATE构建的大规模、区域引导的三维胸部CT解读数据集。具体而言,我们利用最新的通用分割模型和大语言模型,从以下方面对原始数据集(来自20000名患者的25692个非增强三维胸部CT容积及对应报告)进行扩展:(i)覆盖197个类别的器官级分割掩码,为解读提供中间推理视觉线索;(ii)66.5万组多粒度指代报告,其中每条报告语句均以分割掩码形式关联至CT容积的对应解剖区域;(iii)130万组指代视觉问答对,所有问题与答案均关联参考分割掩码,使模型能够将视觉证据与文本解释相关联。验证集中所有指代报告和视觉问答对均经过人工验证以确保数据集质量。我们相信,RadGenome-Chest CT通过训练模型基于给定分割区域生成文本(这是此前相关数据集无法实现的),将显著推动多模态医学基础模型的发展。我们将公开发布所有分割掩码、指代报告和视觉问答对,以促进该领域的进一步研究与应用。