Pathological image analysis is an important process for detecting abnormalities such as cancer from cell images. However, since the image size is generally very large, the cost of providing detailed annotations is high, which makes it difficult to apply machine learning techniques. One way to improve the performance of identifying abnormalities while keeping the annotation cost low is to use only labels for each slide, or to use information from another dataset that has already been labeled. However, such weak supervisory information often does not provide sufficient performance. In this paper, we proposed a new task setting to improve the classification performance of the target dataset without increasing annotation costs. And to solve this problem, we propose a pipeline that uses multiple instance learning (MIL) and domain adaptation (DA) methods. Furthermore, in order to combine the supervisory information of both methods effectively, we propose a method to create pseudo-labels with high confidence. We conducted experiments on the pathological image dataset we created for this study and showed that the proposed method significantly improves the classification performance compared to existing methods.
翻译:病理图像分析是从细胞图像中检测癌症等异常的重要过程。然而,由于图像尺寸通常非常大,提供详细标注的成本较高,这使得机器学习技术难以应用。一种在保持低标注成本的同时提高异常识别性能的方法是仅使用每张切片的标签,或者利用已有标注的其他数据集信息。然而,这种弱监督信息往往无法提供足够的性能。在本文中,我们提出了一种新的任务设置,旨在不增加标注成本的情况下提高目标数据集的分类性能。为了解决这一问题,我们提出了一种结合多示例学习(MIL)和域自适应(DA)方法的流程。此外,为了有效融合这两种方法的监督信息,我们提出了一种生成高置信度伪标签的方法。我们在为本研究创建的病理图像数据集上进行了实验,结果表明,与现有方法相比,所提出的方法显著提高了分类性能。