Large pretrained transformers are increasingly being developed as generalised foundation models which can underpin powerful task-specific artificial intelligence models. Histopathology foundation models show promise across many tasks, but analyses have been limited by arbitrary hyperparameters that were not tuned to the specific task/dataset. We report the most rigorous single-task validation conducted to date of a histopathology foundation model, and the first performed in ovarian cancer subtyping. Attention-based multiple instance learning classifiers were compared using vision transformer and ResNet features generated through varied preprocessing and pretraining procedures. The training set consisted of 1864 whole slide images from 434 ovarian carcinoma cases at Leeds Hospitals. Five-class classification performance was evaluated through five-fold cross-validation, and these cross-validation models were ensembled for evaluation on a hold-out test set and an external set from the Transcanadian study. Reporting followed the TRIPOD+AI checklist. The vision transformer-based histopathology foundation model, UNI, performed best in every evaluation, with five-class balanced accuracies of 88% and 93% in hold-out internal and external testing, compared to the best ResNet model scores of 68% and 81%, respectively. Normalisations and augmentations aided the generalisability of ResNet-based models, but these still did not match the performance of UNI, which gave the best external performance in any ovarian cancer subtyping study to date. Histopathology foundation models offer a clear benefit to subtyping, improving classification performance to a degree where clinical utility is tangible, albeit with an increased computational burden. Such models could provide a second opinion in challenging cases and may improve the accuracy, objectivity, and efficiency of pathological diagnoses overall.
翻译:大型预训练Transformer正越来越多被开发为通用基础模型,为构建高性能任务特定人工智能模型提供支撑。组织病理学基础模型在多项任务中展现出前景,但过往分析因采用未针对特定任务/数据集优化的随意超参数而受限。我们报告了迄今为止最严谨的单任务验证性研究,首次将组织病理学基础模型应用于卵巢癌分型。通过对比采用不同预处理和预训练流程生成的视觉Transformer与ResNet特征,构建了基于注意力的多实例学习分类器。训练集包含利兹医院434例卵巢癌病例的1864张全切片图像。通过五折交叉验证评估五分类性能,并将交叉验证模型集成后用于保留测试集及跨加拿大研究外部数据集评估。报告遵循TRIPOD+AI清单规范。基于视觉Transformer的组织病理学基础模型UNI在所有评估中表现最优:保留内部测试集五分类平衡准确率达88%,外部测试集达93%;而最佳ResNet模型相应得分分别为68%和81%。标准化与数据增强虽能提升ResNet模型的泛化性,但仍无法企及UNI的性能——该模型在当前所有卵巢癌分型研究中取得最佳外部测试表现。组织病理学基础模型为亚型分类带来显著增益,将分类性能提升至具有临床实用性的水平(尽管计算负担相应增加)。此类模型可为疑难病例提供辅助诊断意见,有望全面提升病理诊断的准确性、客观性与效率。