The limited sample size and insufficient diversity of lung nodule CT datasets severely restrict the performance and generalization ability of detection models. Existing methods generate images with insufficient diversity and controllability, suffering from issues such as monotonous texture features and distorted anatomical structures. Therefore, we propose a two-stage generative adversarial network (TSGAN) to enhance the diversity and spatial controllability of synthetic data by decoupling the morphological structure and texture features of lung nodules. In the first stage, StyleGAN is used to generate semantic segmentation mask images, encoding lung nodules and tissue backgrounds to control the anatomical structure of lung nodule images; The second stage uses the DL-Pix2Pix model to translate the mask map into CT images, employing local importance attention to capture local features, while utilizing dynamic weight multi-head window attention to enhance the modeling capability of lung nodule texture and background. Compared to the original dataset, the accuracy improved by 4.6% and mAP by 4% on the LUNA16 dataset. Experimental results demonstrate that TSGAN can enhance the quality of synthetic images and the performance of detection models.
翻译:肺结节CT数据集的有限样本量和不足的多样性严重制约了检测模型的性能与泛化能力。现有方法生成的图像在多样性和可控性方面存在不足,常出现纹理特征单调和解剖结构扭曲等问题。为此,我们提出一种两阶段生成对抗网络(TSGAN),通过解耦肺结节的形态结构与纹理特征来增强合成数据的多样性和空间可控性。第一阶段使用StyleGAN生成语义分割掩码图像,编码肺结节与组织背景以控制肺结节图像的解剖结构;第二阶段采用DL-Pix2Pix模型将掩码图转换为CT图像,通过局部重要性注意力捕捉局部特征,同时利用动态权重多头窗口注意力增强肺结节纹理与背景的建模能力。在LUNA16数据集上,相较于原始数据集,准确率提升了4.6%,mAP提高了4%。实验结果表明,TSGAN能够有效提升合成图像质量及检测模型性能。