MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC) and is the subject of active drug development. Challenges in identifying patients for these therapies include lack of access to validated testing, such as standardized immunohistochemistry (IHC) assessment, and consumption of valuable tissue for a single gene/protein assay. Development of pre-screening algorithms using routinely available digitized hematoxylin and eosin (H&E)-stained slides to predict MET overexpression could promote testing for those who will benefit most. While assessment of MET expression using IHC is currently not routinely performed in NSCLC, next-generation sequencing is common and in some cases includes RNA expression panel testing. In this work, we leveraged a large database of matched H&E slides and RNA expression data to train a weakly supervised model to predict MET RNA overexpression directly from H&E images. This model was evaluated on an independent holdout test set of 300 over-expressed and 289 normal patients, demonstrating an ROC-AUC of 0.70 (95th percentile interval: 0.66 - 0.74) with stable performance characteristics across different patient clinical variables and robust to synthetic noise on the test set. These results suggest that H&E-based predictive models could be useful to prioritize patients for confirmatory testing of MET protein or MET gene expression status.
翻译:MET蛋白过表达是非小细胞肺癌(NSCLC)中的一个可靶向事件,也是当前新药研发的重点领域。针对这些靶向疗法的患者筛选面临诸多挑战,包括缺乏标准化的免疫组化(IHC)评估等验证性检测手段,以及单一基因/蛋白检测对珍贵组织样本的消耗。利用常规获取的数字化苏木精-伊红(H&E)染色切片开发预筛选算法,可优先为最可能获益的患者提供检测。尽管目前NSCLC中尚未常规开展基于IHC的MET表达评估,但二代测序已广泛应用,且部分检测方案包含RNA表达谱分析。本研究利用大规模配对H&E切片与RNA表达数据,训练了一个弱监督模型,可直接从H&E图像预测MET RNA过表达。该模型在独立验证集(300例过表达患者与289例正常患者)上评估显示ROC-AUC为0.70(95%百分位区间:0.66-0.74),其性能在不同患者临床变量中保持稳定,且对测试集合成噪声具有鲁棒性。结果表明,基于H&E图像的预测模型可为MET蛋白或MET基因表达状态的确认性检测优先选择患者提供有效支持。