Generative modeling has experienced substantial progress in recent years, particularly in text-to-image and text-to-video synthesis. However, the medical field has not yet fully exploited the potential of large-scale foundational models for synthetic data generation. In this paper, we introduce GenerateCT, the first method for text-conditional computed tomography (CT) generation, addressing the limitations in 3D medical imaging research and making our entire framework open-source. GenerateCT consists of a pre-trained large language model, a transformer-based text-conditional 3D chest CT generation architecture, and a text-conditional spatial super-resolution diffusion model. We also propose CT-ViT, which efficiently compresses CT volumes while preserving auto-regressiveness in-depth, enabling the generation of 3D CT volumes with variable numbers of axial slices. Our experiments demonstrate that GenerateCT can produce realistic, high-resolution, and high-fidelity 3D chest CT volumes consistent with medical language text prompts. We further investigate the potential of GenerateCT by training a model using generated CT volumes for multi-abnormality classification of chest CT volumes. Our contributions provide a valuable foundation for future research in text-conditional 3D medical image generation and have the potential to accelerate advancements in medical imaging research. Our code, pre-trained models, and generated data are available at https://github.com/ibrahimethemhamamci/GenerateCT.
翻译:生成建模近年来取得了显著进展,尤其是在文本到图像和文本到视频合成领域。然而,医学领域尚未充分利用大规模基础模型在合成数据生成方面的潜力。本文提出GenerateCT,这是首个用于文本条件计算机断层扫描(CT)生成的方法,旨在解决三维医学影像研究的局限性,并将整个框架开源。GenerateCT由一个预训练的大语言模型、一个基于Transformer的文本条件三维胸部CT生成架构以及一个文本条件空间超分辨率扩散模型组成。我们还提出了CT-ViT,该模型能高效压缩CT体积同时保持深度方向的自回归特性,从而生成轴向切片数量可变的三维CT体积。实验表明,GenerateCT能够生成与医学语言文本提示一致的真实、高分辨率、高保真度的三维胸部CT体积。我们进一步通过使用生成的CT体积训练模型进行胸部CT体积多异常分类,探究了GenerateCT的潜力。我们的贡献为文本条件三维医学图像生成的未来研究奠定了宝贵基础,并有望加速医学影像研究的进展。我们的代码、预训练模型及生成数据已开源至https://github.com/ibrahimethemhamamci/GenerateCT。