The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning (SSL). However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions. We propose TITAN, a multimodal whole slide foundation model pretrained using 335,645 WSIs via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any finetuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis. We evaluate TITAN on diverse clinical tasks and find that TITAN outperforms both ROI and slide foundation models across machine learning settings such as linear probing, few-shot and zero-shot classification, rare cancer retrieval and cross-modal retrieval, and pathology report generation.
翻译:计算病理学领域近年来因基础模型的进展而发生了变革,这些模型通过自监督学习将组织病理学感兴趣区域编码为通用且可迁移的特征表示。然而,将这些进展应用于解决患者和切片层面的复杂临床挑战,仍受限于疾病特异性队列中有限的临床数据,尤其是对于罕见临床病症。我们提出了TITAN,一种多模态全切片基础模型,该模型通过视觉自监督学习以及与相应病理报告和423,122条由多模态生成式AI病理学协理生成合成描述进行视觉-语言对齐,在335,645个全切片图像上进行了预训练。无需任何微调或临床标签,TITAN能够提取通用切片表示并生成病理报告,其能力可泛化至资源有限的临床场景,如罕见疾病检索和癌症预后评估。我们在多样化的临床任务上评估TITAN,发现其在机器学习设置中(如线性探测、少样本和零样本分类、罕见癌症检索与跨模态检索以及病理报告生成)均优于ROI和切片基础模型。