Distill-SODA: Distilling Self-Supervised Vision Transformer for Source-Free Open-Set Domain Adaptation in Computational Pathology

Developing computational pathology models is essential for reducing manual tissue typing from whole slide images, transferring knowledge from the source domain to an unlabeled, shifted target domain, and identifying unseen categories. We propose a practical setting by addressing the above-mentioned challenges in one fell swoop, i.e., source-free open-set domain adaptation. Our methodology focuses on adapting a pre-trained source model to an unlabeled target dataset and encompasses both closed-set and open-set classes. Beyond addressing the semantic shift of unknown classes, our framework also deals with a covariate shift, which manifests as variations in color appearance between source and target tissue samples. Our method hinges on distilling knowledge from a self-supervised vision transformer (ViT), drawing guidance from either robustly pre-trained transformer models or histopathology datasets, including those from the target domain. In pursuit of this, we introduce a novel style-based adversarial data augmentation, serving as hard positives for self-training a ViT, resulting in highly contextualized embeddings. Following this, we cluster semantically akin target images, with the source model offering weak pseudo-labels, albeit with uncertain confidence. To enhance this process, we present the closed-set affinity score (CSAS), aiming to correct the confidence levels of these pseudo-labels and to calculate weighted class prototypes within the contextualized embedding space. Our approach establishes itself as state-of-the-art across three public histopathological datasets for colorectal cancer assessment. Notably, our self-training method seamlessly integrates with open-set detection methods, resulting in enhanced performance in both closed-set and open-set recognition tasks.

翻译：摘要：开发计算病理学模型对于减少全切片图像中的人工组织分型、将知识从源域迁移至无标签且存在偏移的目标域，以及识别未见类别至关重要。我们提出一种实用设置，通过一次性解决上述挑战，即无源开放集域适应。我们的方法侧重于将预训练的源模型适配至无标签的目标数据集，并同时涵盖闭集与开放集类别。除应对未知类别的语义偏移外，本框架还处理协变量偏移——该偏移表现为源域与目标域组织样本间的颜色外观差异。我们的方法核心在于从自监督视觉Transformer（ViT）中蒸馏知识，并从鲁棒预训练的Transformer模型或组织病理学数据集（包括目标域）中获取指导。为此，我们引入一种新颖的基于风格的对抗性数据增强方法，作为自训练ViT的硬正样本，从而生成高度上下文化的嵌入表示。随后，我们对语义相似的目标图像进行聚类，由源模型提供弱伪标签（尽管置信度存在不确定性）。为优化此过程，我们提出闭集亲和度分数（CSAS），旨在校正这些伪标签的置信水平，并在上下文化的嵌入空间中计算加权类原型。本方法在三个用于结直肠癌评估的公开组织病理学数据集上达到最优性能。值得注意的是，我们的自训练方法能与开放集检测方法无缝集成，从而在闭集与开放集识别任务中均实现性能提升。