PathOrchestra：一个包含超过100项多样化临床级任务的计算病理学综合基础模型 (PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks)

Fang Yan,Jianfeng Wu,Jiawen Li,Wei Wang,Jiaxuan Lu,Wen Chen,Zizhao Gao,Jianan Li,Hong Yan,Jiabo Ma,Minda Chen,Yang Lu,Qing Chen,Yizhi Wang,Xitong Ling,Xuenian Wang,Zihan Wang,Qiang Huang,Shengyi Hua,Mianxin Liu,Lei Ma,Tian Shen,Xiaofan Zhang,Yonghong He,Hao Chen,Shaoting Zhang,Zhe Wang

The complexity and variability inherent in high-resolution pathological images present significant challenges in computational pathology. While pathology foundation models leveraging AI have catalyzed transformative advancements, their development demands large-scale datasets, considerable storage capacity, and substantial computational resources. Furthermore, ensuring their clinical applicability and generalizability requires rigorous validation across a broad spectrum of clinical tasks. Here, we present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on a dataset comprising 300K pathological slides from 20 tissue and organ types across multiple centers. The model was rigorously evaluated on 112 clinical tasks using a combination of 61 private and 51 public datasets. These tasks encompass digital slide preprocessing, pan-cancer classification, lesion identification, multi-cancer subtype classification, biomarker assessment, gene expression prediction, and the generation of structured reports. PathOrchestra demonstrated exceptional performance across 27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks, including pan-cancer classification across various organs, lymphoma subtype diagnosis, and bladder cancer screening. Notably, it is the first model to generate structured reports for high-incidence colorectal cancer and diagnostically complex lymphoma-areas that are infrequently addressed by foundational models but hold immense clinical potential. Overall, PathOrchestra exemplifies the feasibility and efficacy of a large-scale, self-supervised pathology foundation model, validated across a broad range of clinical-grade tasks. Its high accuracy and reduced reliance on extensive data annotation underline its potential for clinical integration, offering a pathway toward more efficient and high-quality medical services.

翻译：高分辨率病理图像固有的复杂性和变异性给计算病理学带来了重大挑战。尽管利用人工智能的病理学基础模型已催化了变革性进展，但其开发需要大规模数据集、可观的存储容量和大量的计算资源。此外，确保其临床适用性和泛化能力需要在广泛的临床任务谱系上进行严格验证。在此，我们提出了PathOrchestra，这是一个通过自监督学习在多中心涵盖20种组织和器官类型的30万张病理切片数据集上训练得到的多功能病理学基础模型。该模型结合使用了61个私有和51个公共数据集，在112项临床任务上进行了严格评估。这些任务涵盖数字切片预处理、泛癌分类、病变识别、多癌亚型分类、生物标志物评估、基因表达预测以及结构化报告生成。PathOrchestra在27,755张全切片图像和9,415,729个感兴趣区域上表现出卓越性能，在47项任务中实现了超过0.950的准确率，包括跨多种器官的泛癌分类、淋巴瘤亚型诊断和膀胱癌筛查。值得注意的是，它是首个能为高发病率结直肠癌和诊断复杂的淋巴瘤生成结构化报告的模型——这些领域基础模型鲜有涉足，但具有巨大的临床潜力。总体而言，PathOrchestra例证了一个大规模、自监督的病理学基础模型在广泛临床级任务上验证后的可行性和有效性。其高准确率和降低对大量数据标注依赖的特点，突显了其临床整合的潜力，为迈向更高效、更高质量的医疗服务提供了一条途径。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日