Source-Free Open-Set Domain Adaptation for Histopathological Images via Distilling Self-Supervised Vision Transformer

There is a strong incentive to develop computational pathology models to i) ease the burden of tissue typology annotation from whole slide histological images; ii) transfer knowledge, e.g., tissue class separability from the withheld source domain to the distributionally shifted unlabeled target domain, and simultaneously iii) detect Open Set samples, i.e., unseen novel categories not present in the training source domain. This paper proposes a highly practical setting by addressing the abovementioned challenges in one fell swoop, i.e., source-free Open Set domain adaptation (SF-OSDA), which addresses the situation where a model pre-trained on the inaccessible source dataset can be adapted on the unlabeled target dataset containing Open Set samples. The central tenet of our proposed method is distilling knowledge from a self-supervised vision transformer trained in the target domain. We propose a novel style-based data augmentation used as hard positives for self-training a vision transformer in the target domain, yielding strongly contextualized embedding. Subsequently, semantically similar target images are clustered while the source model provides their corresponding weak pseudo-labels with unreliable confidence. Furthermore, we propose cluster relative maximum logit score (CRMLS) to rectify the confidence of the weak pseudo-labels and compute weighted class prototypes in the contextualized embedding space that are utilized for adapting the source model on the target domain. Our method significantly outperforms the previous methods, including open set detection, test-time adaptation, and SF-OSDA methods, setting the new state-of-the-art on three public histopathological datasets of colorectal cancer (CRC) assessment- Kather-16, Kather-19, and CRCTP. Our code is available at https://github.com/LTS5/Proto-SF-OSDA.

翻译：开发计算病理学模型具有强烈动机，旨在：(i) 缓解全切片组织图像中组织类型标注的负担；(ii) 将知识（如组织类别的可分离性）从隐藏源域迁移至分布偏移的无标注目标域，同时；(iii) 检测开放集样本，即训练源域中未出现的新颖类别。本文提出一种高度实用的设置，通过一次性解决上述挑战，即无源开放集域自适应（SF-OSDA），该方法应对以下场景：在源数据集不可访问的情况下，预训练模型能够适应包含开放集样本的无标注目标域。我们方法的核心原则是从目标域训练的自监督视觉Transformer中蒸馏知识。我们提出一种新颖的基于风格的数据增强技术，作为训练目标域视觉Transformer的硬正样本，从而生成强上下文化嵌入。随后，语义相似的目标图像被聚类，同时源模型为其提供置信度不可靠的弱伪标签。进一步地，我们提出聚类相对最大对数得分（CRMLS）来修正弱伪标签的置信度，并在上下文化嵌入空间中计算加权类原型，用于在目标域上适配源模型。我们的方法在开放集检测、测试时自适应及SF-OSDA方法上显著优于先前技术，在三个结直肠癌（CRC）评估公共组织病理学数据集（Kather-16、Kather-19和CRCTP）上创下最新最优性能。代码已开源：https://github.com/LTS5/Proto-SF-OSDA。