The complexity and variability inherent in high-resolution pathological images present significant challenges in computational pathology. While pathology foundation models leveraging AI have catalyzed transformative advancements, their development demands large-scale datasets, considerable storage capacity, and substantial computational resources. Furthermore, ensuring their clinical applicability and generalizability requires rigorous validation across a broad spectrum of clinical tasks. Here, we present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on a dataset comprising 300K pathological slides from 20 tissue and organ types across multiple centers. The model was rigorously evaluated on 112 clinical tasks using a combination of 61 private and 51 public datasets. These tasks encompass digital slide preprocessing, pan-cancer classification, lesion identification, multi-cancer subtype classification, biomarker assessment, gene expression prediction, and the generation of structured reports. PathOrchestra demonstrated exceptional performance across 27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks, including pan-cancer classification across various organs, lymphoma subtype diagnosis, and bladder cancer screening. Notably, it is the first model to generate structured reports for high-incidence colorectal cancer and diagnostically complex lymphoma-areas that are infrequently addressed by foundational models but hold immense clinical potential. Overall, PathOrchestra exemplifies the feasibility and efficacy of a large-scale, self-supervised pathology foundation model, validated across a broad range of clinical-grade tasks. Its high accuracy and reduced reliance on extensive data annotation underline its potential for clinical integration, offering a pathway toward more efficient and high-quality medical services.
翻译:高分辨率病理图像固有的复杂性和变异性给计算病理学带来了重大挑战。尽管利用人工智能的病理学基础模型已催化了变革性进展,但其开发需要大规模数据集、可观的存储容量和大量的计算资源。此外,确保其临床适用性和泛化能力需要在广泛的临床任务谱系上进行严格验证。在此,我们提出了PathOrchestra,这是一个通过自监督学习在多中心涵盖20种组织和器官类型的30万张病理切片数据集上训练得到的多功能病理学基础模型。该模型结合使用了61个私有和51个公共数据集,在112项临床任务上进行了严格评估。这些任务涵盖数字切片预处理、泛癌分类、病变识别、多癌亚型分类、生物标志物评估、基因表达预测以及结构化报告生成。PathOrchestra在27,755张全切片图像和9,415,729个感兴趣区域上表现出卓越性能,在47项任务中实现了超过0.950的准确率,包括跨多种器官的泛癌分类、淋巴瘤亚型诊断和膀胱癌筛查。值得注意的是,它是首个能为高发病率结直肠癌和诊断复杂的淋巴瘤生成结构化报告的模型——这些领域基础模型鲜有涉足,但具有巨大的临床潜力。总体而言,PathOrchestra例证了一个大规模、自监督的病理学基础模型在广泛临床级任务上验证后的可行性和有效性。其高准确率和降低对大量数据标注依赖的特点,突显了其临床整合的潜力,为迈向更高效、更高质量的医疗服务提供了一条途径。