Diagnosing rare anemia disorders using microscopic images is challenging for skilled specialists and machine-learning methods alike. Due to thousands of disease-relevant cells in a single blood sample, this constitutes a complex multiple-instance learning (MIL) problem. While the spatial neighborhood of red blood cells is not meaningful per se, the topology, i.e., the geometry of blood samples as a whole, contains informative features to remedy typical MIL issues, such as vanishing gradients and overfitting when training on limited data. We thus develop a topology-based approach that extracts multi-scale topological features from bags of single red blood cell images. The topological features are used to regularize the model, enforcing the preservation of characteristic topological properties of the data. Applied to a dataset of 71 patients suffering from rare anemia disorders with 521 microscopic images of red blood cells, our experiments show that topological regularization is an effective method that leads to more than 3% performance improvements for the automated classification of rare anemia disorders based on single-cell images. This is the first approach that uses topological properties for regularizing the MIL process.
翻译:使用显微图像诊断罕见贫血障碍对专业医生和机器学习方法均构成挑战。由于单个血液样本中包含数千个与疾病相关的细胞,这构成了一个复杂的多示例学习(MIL)问题。尽管红细胞的邻域空间本身并不具有意义,但拓扑结构(即血液样本整体的几何形态)包含信息丰富的特征,可弥补典型MIL问题(如梯度消失和有限数据训练时的过拟合)。因此,我们开发了一种基于拓扑的方法,从红细胞单细胞图像包中提取多尺度拓扑特征。这些拓扑特征用于正则化模型,强制保留数据中特征性的拓扑性质。应用于包含71名罕见贫血障碍患者、521张红细胞显微图像的数据集时,我们的实验表明:拓扑正则化是一种有效方法,可使基于单细胞图像的罕见贫血障碍自动分类性能提升超过3%。这是首次利用拓扑性质正则化MIL过程的研究。