Computer vision models are increasingly capable of classifying ovarian epithelial cancer subtypes, but they differ from pathologists by processing small tissue patches at a single resolution. Multi-resolution graph models leverage the spatial relationships of patches at multiple magnifications, learning the context for each patch. In this study, we conduct the most thorough validation of a graph model for ovarian cancer subtyping to date. Seven models were tuned and trained using five-fold cross-validation on a set of 1864 whole slide images (WSIs) from 434 patients treated at Leeds Teaching Hospitals NHS Trust. The cross-validation models were ensembled and evaluated using a balanced hold-out test set of 100 WSIs from 30 patients, and an external validation set of 80 WSIs from 80 patients in the Transcanadian Study. The best-performing model, a graph model using 10x+20x magnification data, gave balanced accuracies of 73%, 88%, and 99% in cross-validation, hold-out testing, and external validation, respectively. However, this only exceeded the performance of attention-based multiple instance learning in external validation, with a 93% balanced accuracy. Graph models benefitted greatly from using the UNI foundation model rather than an ImageNet-pretrained ResNet50 for feature extraction, with this having a much greater effect on performance than changing the subsequent classification approach. The accuracy of the combined foundation model and multi-resolution graph network offers a step towards the clinical applicability of these models, with a new highest-reported performance for this task, though further validations are still required to ensure the robustness and usability of the models.
翻译:计算机视觉模型在卵巢上皮癌亚型分类方面能力日益增强,但其与病理学家的差异在于仅能处理单一分辨率的小型组织切片。多分辨率图模型通过利用多个放大倍数下切片的空间关系,学习每个切片的上下文信息。本研究对卵巢癌亚型分类的图模型进行了迄今最全面的验证。我们使用来自利兹教学医院 NHS 信托基金 434 名患者的 1864 张全切片图像(WSI)数据集,通过五折交叉验证对七个模型进行调优和训练。交叉验证模型通过集成后,使用来自 30 名患者的 100 张 WSI 平衡保留测试集,以及跨加拿大研究中来自 80 名患者的 80 张 WSI 外部验证集进行评估。性能最佳的模型(使用 10 倍+20 倍放大倍数数据的图模型)在交叉验证、保留测试和外部验证中分别达到 73%、88% 和 99% 的平衡准确率。然而,该模型仅在外部验证中超越了基于注意力的多示例学习方法(其平衡准确率为 93%)。图模型采用 UNI 基础模型而非 ImageNet 预训练的 ResNet50 进行特征提取时性能显著提升,这一改变对性能的影响远大于后续分类方法的调整。结合基础模型与多分辨率图网络的准确率为该任务提供了新的最高报告性能,向临床适用性迈进了一步,但仍需进一步验证以确保模型的稳健性和可用性。