The Cox Proportional Hazards (PH) model is widely used in survival analysis. Recently, artificial neural network (ANN)-based Cox-PH models have been developed. However, training these Cox models with high-dimensional features typically requires a substantial number of labeled samples containing information about time-to-event. The limited availability of labeled data for training often constrains the performance of ANN-based Cox models. To address this issue, we employed a deep semi-supervised learning (DSSL) approach to develop single- and multi-modal ANN-based Cox models based on the Mean Teacher (MT) framework, which utilizes both labeled and unlabeled data for training. We applied our model, named Cox-MT, to predict the prognosis of several types of cancer using data from The Cancer Genome Atlas (TCGA). Our single-modal Cox-MT models, utilizing TCGA RNA-seq data or whole slide images, significantly outperformed the existing ANN-based Cox model, Cox-nnet, using the same data set across four types of cancer considered. As the number of unlabeled samples increased, the performance of Cox-MT significantly improved with a given set of labeled data. Furthermore, our multi-modal Cox-MT model demonstrated considerably better performance than the single-modal model. In summary, the Cox-MT model effectively leverages both labeled and unlabeled data to significantly enhance prediction accuracy compared to existing ANN-based Cox models trained solely on labeled data.
翻译:Cox比例风险模型在生存分析中被广泛使用。最近,基于人工神经网络的Cox-PH模型已被开发出来。然而,使用高维特征训练这些Cox模型通常需要大量包含事件发生时间信息的标记样本。训练用标记数据的有限可用性常常制约着基于ANN的Cox模型的性能。为解决这一问题,我们采用深度半监督学习方法,基于Mean Teacher框架开发了单模态和多模态的基于ANN的Cox模型,该框架同时利用标记和未标记数据进行训练。我们将我们的模型(命名为Cox-MT)应用于利用癌症基因组图谱的数据预测几种癌症的预后。我们的单模态Cox-MT模型,利用TCGA RNA-seq数据或全切片图像,在所考虑的四类癌症中,使用相同数据集时,其性能显著优于现有的基于ANN的Cox模型Cox-nnet。随着未标记样本数量的增加,在给定一组标记数据的情况下,Cox-MT的性能显著提升。此外,我们的多模态Cox-MT模型表现出比单模态模型好得多的性能。总之,与仅使用标记数据训练的现有基于ANN的Cox模型相比,Cox-MT模型能有效利用标记和未标记数据,显著提高预测准确性。