Positive-unlabeled learning for binary and multi-class cell detection in histopathology images with incomplete annotations

from arxiv, Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2022:027. arXiv admin note: text overlap with arXiv:2106.15918

Cell detection in histopathology images is of great interest to clinical practice and research, and convolutional neural networks (CNNs) have achieved remarkable cell detection results. Typically, to train CNN-based cell detection models, every positive instance in the training images needs to be annotated, and instances that are not labeled as positive are considered negative samples. However, manual cell annotation is complicated due to the large number and diversity of cells, and it can be difficult to ensure the annotation of every positive instance. In many cases, only incomplete annotations are available, where some of the positive instances are annotated and the others are not, and the classification loss term for negative samples in typical network training becomes incorrect. In this work, to address this problem of incomplete annotations, we propose to reformulate the training of the detection network as a positive-unlabeled learning problem. Since the instances in unannotated regions can be either positive or negative, they have unknown labels. Using the samples with unknown labels and the positively labeled samples, we first derive an approximation of the classification loss term corresponding to negative samples for binary cell detection, and based on this approximation we further extend the proposed framework to multi-class cell detection. For evaluation, experiments were performed on four publicly available datasets. The experimental results show that our method improves the performance of cell detection in histopathology images given incomplete annotations for network training.

翻译：细胞检测在组织病理学图像的临床实践与研究中具有重要意义，卷积神经网络(CNN)已取得显著的细胞检测效果。通常，训练基于CNN的细胞检测模型时，需对训练图像中的每个正实例进行标注，未标注的实例则被视为负样本。然而，由于细胞数量庞大且形态多样，人工标注过程十分复杂，难以确保每个正实例均被标注。在许多情况下，仅存在不完全标注：部分正实例被标注而其余未被标注，此时典型网络训练中针对负样本的分类损失项将出现偏差。为解决不完全标注问题，本研究提出将检测网络训练重新定义为正-未标注学习问题。由于未标注区域中的实例可能为正样本或负样本，其标签未知。利用未知标签样本与正标注样本，我们首先推导出二元细胞检测中负样本对应分类损失项的近似表达式，并基于此近似方法进一步将所提框架扩展至多类细胞检测。在四个公开数据集上的实验评估表明，本方法在训练网络使用不完全标注的情况下，显著提升了组织病理学图像中细胞检测的性能。