Shortcut learning is a phenomenon where machine learning models prioritize learning simple, potentially misleading cues from data that do not generalize well beyond the training set. While existing research primarily investigates this in the realm of image classification, this study extends the exploration of shortcut learning into medical image segmentation. We demonstrate that clinical annotations such as calipers, and the combination of zero-padded convolutions and center-cropped training sets in the dataset can inadvertently serve as shortcuts, impacting segmentation accuracy. We identify and evaluate the shortcut learning on two different but common medical image segmentation tasks. In addition, we suggest strategies to mitigate the influence of shortcut learning and improve the generalizability of the segmentation models. By uncovering the presence and implications of shortcuts in medical image segmentation, we provide insights and methodologies for evaluating and overcoming this pervasive challenge and call for attention in the community for shortcuts in segmentation. Our code is public at https://github.com/nina-weng/shortcut_skinseg .
翻译:捷径学习是指机器学习模型优先从数据中学习简单但可能具有误导性的线索,这些线索在训练集之外泛化能力较差的现象。尽管现有研究主要关注图像分类领域的捷径学习,本研究将这一探索扩展至医学图像分割领域。我们证明,诸如测径器之类的临床标注,以及数据集中零填充卷积与中心裁剪训练集的结合,可能无意中成为捷径,影响分割准确性。我们在两种不同但常见的医学图像分割任务中识别并评估了捷径学习现象。此外,我们提出了缓解捷径学习影响、提升分割模型泛化能力的策略。通过揭示医学图像分割中捷径的存在及其影响,我们为评估和克服这一普遍性挑战提供了见解与方法论,并呼吁学界关注分割任务中的捷径问题。我们的代码公开于 https://github.com/nina-weng/shortcut_skinseg。