Leveraging Global Binary Masks for Structure Segmentation in Medical Images

Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. We proposed to leverage the consistency of organs' anatomical shape and position information in medical images. We introduced a framework leveraging recurring anatomical patterns through global binary masks for organ segmentation. Two scenarios were studied.1) Global binary masks were the only model's (i.e. U-Net) input, forcing exclusively encoding organs' position and shape information for segmentation/localization.2) Global binary masks were incorporated as an additional channel functioning as position/shape clues to mitigate training data scarcity. Two datasets of the brain and heart CT images with their ground-truth were split into (26:10:10) and (12:3:5) for training, validation, and test respectively. Training exclusively on global binary masks led to Dice scores of 0.77(0.06) and 0.85(0.04), with the average Euclidian distance of 3.12(1.43)mm and 2.5(0.93)mm relative to the center of mass of the ground truth for the brain and heart structures respectively. The outcomes indicate that a surprising degree of position and shape information is encoded through global binary masks. Incorporating global binary masks led to significantly higher accuracy relative to the model trained on only CT images in small subsets of training data; the performance improved by 4.3-125.3% and 1.3-48.1% for 1-8 training cases of the brain and heart datasets respectively. The findings imply the advantages of utilizing global binary masks for building generalizable models and to compensate for training data scarcity.

翻译：深度学习（DL）模型在医学图像分割中高度受输入图像强度变化的影响，且因主要依赖像素强度信息进行推理而缺乏泛化能力。获取充足的训练数据是限制模型应用的又一挑战。我们提出利用医学图像中器官解剖形状和位置信息的一致性，引入了一种通过全局二值掩模捕捉重复解剖模式的框架，用于器官分割。研究涵盖两种场景：1) 仅将全局二值掩模作为模型（即U-Net）的输入，迫使模型专门编码器官位置和形状信息以完成分割/定位；2) 将全局二值掩模作为附加通道融入模型，提供位置/形状线索以缓解训练数据稀缺问题。采用大脑和心脏CT图像的两个数据集及其真实标注，分别按（26:10:10）和（12:3:5）的比例划分为训练集、验证集和测试集。仅基于全局二值掩模训练时，大脑和心脏结构的Dice得分分别为0.77（0.06）和0.85（0.04），相对于真实标注质心的平均欧氏距离分别为3.12（1.43）mm和2.5（0.93）mm。结果表明，全局二值掩模编码了令人惊讶的位置和形状信息。在小规模训练子集下，引入全局二值掩模使模型精度显著高于仅使用CT图像训练的模型：对于大脑和心脏数据集，在1-8个训练样本时性能分别提升4.3%-125.3%和1.3%-48.1%。这些发现揭示了利用全局二值掩模构建可泛化模型并弥补训练数据稀缺的优势。