Nuclei segmentation is a fundamental but challenging task in the quantitative analysis of histopathology images. Although fully-supervised deep learning-based methods have made significant progress, a large number of labeled images are required to achieve great segmentation performance. Considering that manually labeling all nuclei instances for a dataset is inefficient, obtaining a large-scale human-annotated dataset is time-consuming and labor-intensive. Therefore, augmenting a dataset with only a few labeled images to improve the segmentation performance is of significant research and application value. In this paper, we introduce the first diffusion-based augmentation method for nuclei segmentation. The idea is to synthesize a large number of labeled images to facilitate training the segmentation model. To achieve this, we propose a two-step strategy. In the first step, we train an unconditional diffusion model to synthesize the Nuclei Structure that is defined as the representation of pixel-level semantic and distance transform. Each synthetic nuclei structure will serve as a constraint on histopathology image synthesis and is further post-processed to be an instance map. In the second step, we train a conditioned diffusion model to synthesize histopathology images based on nuclei structures. The synthetic histopathology images paired with synthetic instance maps will be added to the real dataset for training the segmentation model. The experimental results show that by augmenting 10% labeled real dataset with synthetic samples, one can achieve comparable segmentation results with the fully-supervised baseline. The code is released in: https://github.com/lhaof/Nudiff
翻译:细胞核分割是组织病理学图像定量分析中一项基础但具有挑战性的任务。尽管基于全监督深度学习的方法取得了显著进展,但实现优异的分割性能仍需大量标注图像。考虑到手动标注数据集中所有细胞核实例效率低下,获取大规模人工标注数据集既耗时又费人力。因此,利用仅有的少量标注图像增强数据集以提升分割性能具有重要的研究与应用价值。本文首次提出基于扩散的细胞核分割数据增强方法,其核心思想是通过合成大量标注图像来辅助分割模型训练。为此,我们设计了两步策略:第一步,训练无条件扩散模型合成细胞核结构(定义为像素级语义与距离变换的表示),每个合成细胞核结构将作为组织病理学图像合成的约束条件,并进一步后处理为实例图;第二步,训练基于细胞核结构的条件扩散模型以合成组织病理学图像。将合成组织病理学图像与合成实例图配对后加入真实数据集,用于分割模型训练。实验结果表明,仅用10%标注真实数据集搭配合成样本进行增强,即可达到与全监督基线相当的分割效果。代码已开源至:https://github.com/lhaof/Nudiff