Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unlabeled, unannotated pathology slides

Adalberto Claudio Quiros,Nicolas Coudray,Anna Yeaton,Xinyu Yang,Bojing Liu,Hortense Le,Luis Chiriboga,Afreen Karimkhan,Navneet Narula,David A. Moore,Christopher Y. Park,Harvey Pass,Andre L. Moreira,John Le Quesne,Aristotelis Tsirigos,Ke Yuan

Definitive cancer diagnosis and management depend upon the extraction of information from microscopy images by pathologists. These images contain complex information requiring time-consuming expert human interpretation that is prone to human bias. Supervised deep learning approaches have proven powerful for classification tasks, but they are inherently limited by the cost and quality of annotations used for training these models. To address this limitation of supervised methods, we developed Histomorphological Phenotype Learning (HPL), a fully blue{self-}supervised methodology that requires no expert labels or annotations and operates via the automatic discovery of discriminatory image features in small image tiles. Tiles are grouped into morphologically similar clusters which constitute a library of histomorphological phenotypes, revealing trajectories from benign to malignant tissue via inflammatory and reactive phenotypes. These clusters have distinct features which can be identified using orthogonal methods, linking histologic, molecular and clinical phenotypes. Applied to lung cancer tissues, we show that they align closely with patient survival, with histopathologically recognised tumor types and growth patterns, and with transcriptomic measures of immunophenotype. We then demonstrate that these properties are maintained in a multi-cancer study. These results show the clusters represent recurrent host responses and modes of tumor growth emerging under natural selection. Code, pre-trained models, learned embeddings, and documentation are available to the community at https://github.com/AdalbertoCq/Histomorphological-Phenotype-Learning

翻译：癌症的确诊及治疗管理依赖于病理学家从显微图像中提取信息。这些图像包含复杂信息，需要耗时且易受人为偏倚的专家人工解读。监督式深度学习方法在分类任务中已被证明有效，但其本质受限于用于训练模型的标注成本和标注质量。为解决监督方法的这一局限性，我们提出了组织形态学表型学习（HPL）方法——一种完全自监督的方法，无需专家标签或注释，通过自动发现小块图像中的判别性特征来运行。这些小块图像被聚类为形态相似的簇，构成组织形态学表型的库，并揭示了从良性到恶性组织经炎症和反应性表型的演化轨迹。这些簇具有可通过正交方法识别的独特特征，将组织学、分子学和临床表型联系起来。应用于肺癌组织时，我们发现这些簇与患者生存率、组织病理学公认的肿瘤类型和生长模式，以及免疫表型的转录组学测量结果紧密吻合。我们进一步证明这些特性在多癌种研究中保持不变。这些结果表明，这些簇代表了在自然选择下产生的复发性宿主反应和肿瘤生长模式。代码、预训练模型、学习嵌入及文档已公开于 https://github.com/AdalbertoCq/Histomorphological-Phenotype-Learning。