The recently emerging conditional diffusion models seem promising for mitigating the labor and expenses in building large 3D medical imaging datasets. However, previous studies on 3D CT generation have yet to fully capitalize on semantic and textual conditions, and they have primarily focused on specific organs characterized by a local structure and fixed contrast. In this work, we present GuideGen, a controllable framework that generates anatomical masks and corresponding CT volumes for the entire torso-from chest to pelvis-based on free-form text prompts. Our approach includes three core components: a text-conditional semantic synthesizer for creating realistic full-torso anatomies; a contrast-aware autoencoder for detailed, high-fidelity feature extraction across varying contrast levels; and a latent feature generator that ensures alignment between CT images, anatomical semantics and input prompts. To train and evaluate GuideGen, we compile a multi-modality cancer imaging dataset with paired CT and clinical descriptions from 12 public TCIA datasets and one private real-world dataset. Comprehensive evaluations across generation quality, cross-modality alignment, and data usability on multi-organ and tumor segmentation tasks demonstrate GuideGen's superiority over existing CT generation methods.
翻译:近年来兴起的条件扩散模型有望缓解构建大规模三维医学影像数据集所需的人力与物力成本。然而,现有关于三维CT生成的研究尚未充分利用语义与文本条件,且主要聚焦于具有局部结构和固定对比度的特定器官。本研究提出GuideGen,一个可控的生成框架,能够基于自由形式的文本提示生成从胸部到骨盆的完整躯干解剖掩模及对应的CT体数据。我们的方法包含三个核心组件:用于生成逼真全躯干解剖结构的文本条件语义合成器;用于在不同对比度水平下进行细节丰富、高保真特征提取的对比度感知自编码器;以及确保CT图像、解剖语义与输入提示之间对齐的潜在特征生成器。为训练和评估GuideGen,我们整合了一个多模态癌症影像数据集,其中包含来自12个公开TCIA数据集和一个私有真实世界数据集的配对CT影像与临床描述。通过在生成质量、跨模态对齐度以及多器官与肿瘤分割任务中的数据可用性等方面的综合评估,结果表明GuideGen优于现有的CT生成方法。