A General-Purpose Self-Supervised Model for Computational Pathology

Richard J. Chen,Tong Ding,Ming Y. Lu,Drew F. K. Williamson,Guillaume Jaume,Bowen Chen,Andrew Zhang,Daniel Shao,Andrew H. Song,Muhammad Shaban,Mane Williams,Anurag Vaidya,Sharifa Sahai,Lukas Oldenburg,Luca L. Weishaupt,Judy J. Wang,Walt Williams,Long Phi Le,Georg Gerber,Faisal Mahmood

Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts have proposed using pretrained image encoders with either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin-stained WSIs across 20 major tissue types, and evaluated on 33 representative CPath clinical tasks in CPath of varying diagnostic difficulties. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree code classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient AI models that can generalize and transfer to a gamut of diagnostically-challenging tasks and clinical workflows in anatomic pathology.

翻译：组织表型分析是计算病理学中的一项基础任务，旨在学习解剖病理学中组织病理学生物标志物的客观表征。然而，全切片成像涉及复杂的计算机视觉问题：WSI的图像分辨率极高，且形态表型多样性巨大，这限制了大规模数据标注。当前研究通常采用预训练图像编码器（包括从自然图像数据集进行迁移学习，或利用公开组织病理学数据集进行自监督预训练），但尚未在多种组织类型上进行大规模开发与系统性评估。我们提出UNI——一款面向病理学的通用自监督模型。该模型基于来自20种主要组织类型的10万余张诊断性苏木精-伊红染色WSI中的逾1亿个组织补丁进行预训练，并在33项涵盖不同诊断难度的代表性CPath临床任务中接受评估。除优于此前最优模型外，UNI还展现了CPath领域的新型建模能力：分辨率无关的组织分类、基于少样本类别原型的切片分类，以及在OncoTree编码分类系统中对多达108种癌症类型进行泛化的疾病亚型分类。UNI在预训练数据规模和下游评估维度上推动了CPath领域大规模无监督表示学习的发展，使得数据高效的AI模型能够泛化并迁移至解剖病理学中一系列诊断挑战性任务与临床工作流。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日