Deep Learning based diagnostics systems can provide accurate and robust quantitative analysis in digital pathology. These algorithms require large amounts of annotated training data which is impractical in pathology due to the high resolution of histopathological images. Hence, self-supervised methods have been proposed to learn features using ad-hoc pretext tasks. The self-supervised training process is time consuming and often leads to subpar feature representation due to a lack of constrain on the learnt feature space, particularly prominent under data imbalance. In this work, we propose to actively sample the training set using a handful of labels and a small proxy network, decreasing sample requirement by 93% and training time by 62%.
翻译:基于深度学习的诊断系统能够在数字病理学中提供准确且稳健的定量分析。这类算法需要大量带标注的训练数据,而在病理学中,由于组织病理学图像分辨率极高,获取此类数据并不实际。为此,研究人员提出了利用特定预文任务的自我监督学习方法。然而,自我监督训练过程耗时较长,且由于缺乏对学习特征空间的约束,尤其在数据不平衡情况下,往往导致特征表示效果欠佳。本研究提出,通过使用少量标签和一个小型代理网络主动采样训练集,将样本需求降低93%,训练时间减少62%。