Multiple instance learning exhibits a powerful approach for whole slide image-based diagnosis in the absence of pixel- or patch-level annotations. In spite of the huge size of hole slide images, the number of individual slides is often rather small, leading to a small number of labeled samples. To improve training, we propose and investigate different data augmentation strategies for multiple instance learning based on the idea of linear interpolations of feature vectors (known as MixUp). Based on state-of-the-art multiple instance learning architectures and two thyroid cancer data sets, an exhaustive study is conducted considering a range of common data augmentation strategies. Whereas a strategy based on to the original MixUp approach showed decreases in accuracy, the use of a novel intra-slide interpolation method led to consistent increases in accuracy.
翻译:多实例学习为无需像素级或病理块级标注的全切片图像诊断提供了强大方法。尽管全切片图像尺寸巨大,但单个切片的数量往往较少,导致标注样本数量有限。为提升训练效果,我们基于特征向量线性插值(即MixUp方法)的思想,提出并研究了多种面向多实例学习的数据增强策略。基于当前最先进的多实例学习架构及两个甲状腺癌数据集,我们开展了涵盖多种常见数据增强策略的全面研究。结果显示,采用原始MixUp方法的策略导致了准确率下降,而一种新型的切片内插值方法则持续提升了准确率。