基于LIDC-IDRI数据集利用潜在扩散模型生成胸部CT肺结节图像 (Generation of Chest CT pulmonary Nodule Images by Latent Diffusion Models using the LIDC-IDRI Dataset)

Recently, computer-aided diagnosis systems have been developed to support diagnosis, but their performance depends heavily on the quality and quantity of training data. However, in clinical practice, it is difficult to collect the large amount of CT images for specific cases, such as small cell carcinoma with low epidemiological incidence or benign tumors that are difficult to distinguish from malignant ones. This leads to the challenge of data imbalance. In this study, to address this issue, we proposed a method to automatically generate chest CT nodule images that capture target features using latent diffusion models (LDM) and verified its effectiveness. Using the LIDC-IDRI dataset, we created pairs of nodule images and finding-based text prompts based on physician evaluations. For the image generation models, we used Stable Diffusion version 1.5 (SDv1) and 2.0 (SDv2), which are types of LDM. Each model was fine-tuned using the created dataset. During the generation process, we adjusted the guidance scale (GS), which indicates the fidelity to the input text. Both quantitative and subjective evaluations showed that SDv2 (GS = 5) achieved the best performance in terms of image quality, diversity, and text consistency. In the subjective evaluation, no statistically significant differences were observed between the generated images and real images, confirming that the quality was equivalent to real clinical images. We proposed a method for generating chest CT nodule images based on input text using LDM. Evaluation results demonstrated that the proposed method could generate high-quality images that successfully capture specific medical features.

翻译：近年来，计算机辅助诊断系统已被开发用于辅助诊断，但其性能在很大程度上依赖于训练数据的质量与数量。然而，在临床实践中，针对特定病例（如流行病学发病率较低的小细胞癌或难以与恶性肿瘤区分的良性肿瘤）收集大量CT图像十分困难，这导致了数据不平衡的挑战。为解决这一问题，本研究提出了一种利用潜在扩散模型自动生成捕捉目标特征的胸部CT结节图像的方法，并验证了其有效性。我们使用LIDC-IDRI数据集，基于医师评估创建了结节图像与基于影像发现的文本提示对。图像生成模型采用潜在扩散模型的两种变体：Stable Diffusion 1.5版（SDv1）与2.0版（SDv2）。各模型均使用所构建的数据集进行微调。在生成过程中，我们调整了表征文本输入忠实度的引导尺度参数。定量与主观评估均表明，SDv2（引导尺度=5）在图像质量、多样性与文本一致性方面取得了最佳性能。主观评估中，生成图像与真实图像未呈现统计学显著差异，证实其质量与真实临床图像等效。本研究提出了基于输入文本、利用潜在扩散模型生成胸部CT结节图像的方法。评估结果表明，所提方法能够生成高质量图像，并成功捕捉特定医学特征。