MET protein overexpression is a targetable event in non-small cell lung cancer (NSCLC) and is the subject of active drug development. Challenges in identifying patients for these therapies include lack of access to validated testing, such as standardized immunohistochemistry (IHC) assessment, and consumption of valuable tissue for a single gene/protein assay. Development of pre-screening algorithms using routinely available digitized hematoxylin and eosin (H&E)-stained slides to predict MET overexpression could promote testing for those who will benefit most. While assessment of MET expression using IHC is currently not routinely performed in NSCLC, next-generation sequencing is common and in some cases includes RNA expression panel testing. In this work, we leveraged a large database of matched H&E slides and RNA expression data to train a weakly supervised model to predict MET RNA overexpression directly from H&E images. This model was evaluated on an independent holdout test set of 300 over-expressed and 289 normal patients, demonstrating an ROC-AUC of 0.70 (95th percentile interval: 0.66 - 0.74) with stable performance characteristics across different patient clinical variables and robust to synthetic noise on the test set. These results suggest that H&E-based predictive models could be useful to prioritize patients for confirmatory testing of MET protein or MET gene expression status.
翻译:MET蛋白过表达是非小细胞肺癌(NSCLC)中可靶向的事件,也是当前药物研发的重点方向。针对此类治疗的患者的识别面临多重挑战,包括缺乏标准化免疫组化(IHC)评估等经过验证的检测手段,以及单一基因/蛋白检测需消耗宝贵组织样本。利用常规可获取的数字化苏木精-伊红(H&E)染色切片开发MET过表达预筛查算法,可推动最可能获益人群的靶向检测。尽管基于IHC的MET表达评估目前尚未在NSCLC中常规开展,但二代测序已普遍应用,部分病例还包含RNA表达谱检测项目。本研究利用大规模配对H&E切片与RNA表达数据构建弱监督模型,实现直接基于H&E图像预测MET RNA过表达。该模型在包含300例过表达患者和289例正常患者的独立留出测试集上评估,ROC-AUC达0.70(95百分位区间:0.66–0.74),在不同患者临床变量中表现稳定,且对测试集上的合成噪声具有鲁棒性。上述结果表明,基于H&E的预测模型可用于优先筛选需行MET蛋白或MET基因表达状态确认检测的患者。