AI for cancer detection encounters the bottleneck of data scarcity, annotation difficulty, and low prevalence of early tumors. Tumor synthesis seeks to create artificial tumors in medical images, which can greatly diversify the data and annotations for AI training. However, current tumor synthesis approaches are not applicable across different organs due to their need for specific expertise and design. This paper establishes a set of generic rules to simulate tumor development. Each cell (pixel) is initially assigned a state between zero and ten to represent the tumor population, and a tumor can be developed based on three rules to describe the process of growth, invasion, and death. We apply these three generic rules to simulate tumor development--from pixel to cancer--using cellular automata. We then integrate the tumor state into the original computed tomography (CT) images to generate synthetic tumors across different organs. This tumor synthesis approach allows for sampling tumors at multiple stages and analyzing tumor-organ interaction. Clinically, a reader study involving three expert radiologists reveals that the synthetic tumors and their developing trajectories are convincingly realistic. Technically, we analyze and simulate tumor development at various stages using 9,262 raw, unlabeled CT images sourced from 68 hospitals worldwide. The performance in segmenting tumors in the liver, pancreas, and kidneys exceeds prevailing literature benchmarks, underlining the immense potential of tumor synthesis, especially for earlier cancer detection. The code and models are available at https://github.com/MrGiovanni/Pixel2Cancer
翻译:癌症检测的人工智能面临数据稀缺、标注困难以及早期肿瘤低患病率的瓶颈。肿瘤合成旨在医学图像中创建人工肿瘤,这能极大地丰富人工智能训练的数据和标注。然而,当前的肿瘤合成方法由于需要特定专业知识和设计,无法跨不同器官适用。本文建立了一套模拟肿瘤发展的通用规则。每个细胞(像素)初始被赋予零到十之间的状态以代表肿瘤种群,并基于三条规则描述生长、侵袭和死亡过程来发展肿瘤。我们应用这三条通用规则,利用细胞自动机模拟肿瘤发展——从像素到癌症。随后,我们将肿瘤状态整合到原始计算机断层扫描(CT)图像中,以在不同器官中生成合成肿瘤。这种肿瘤合成方法允许对多个阶段的肿瘤进行采样,并分析肿瘤与器官的相互作用。在临床上,一项涉及三位专业放射科医师的读者研究表明,合成肿瘤及其发展轨迹具有令人信服的真实性。在技术上,我们使用来自全球68家医院的9,262张原始未标注CT图像,分析和模拟了不同阶段的肿瘤发展。在肝脏、胰腺和肾脏肿瘤分割任务上的性能超越了现有文献基准,凸显了肿瘤合成的巨大潜力,尤其是在早期癌症检测方面。代码和模型可在https://github.com/MrGiovanni/Pixel2Cancer获取。