One of the most challenging fields where Artificial Intelligence (AI) can be applied is lung cancer research, specifically non-small cell lung cancer (NSCLC). In particular, overall survival (OS), the time between diagnosis and death, is a vital indicator of patient status, enabling tailored treatment and improved OS rates. In this analysis, there are two challenges to take into account. First, few studies effectively exploit the information available from each patient, leveraging both uncensored (i.e., dead) and censored (i.e., survivors) patients, considering also the events' time. Second, the handling of incomplete data is a common issue in the medical field. This problem is typically tackled through the use of imputation methods. Our objective is to present an AI model able to overcome these limits, effectively learning from both censored and uncensored patients and their available features, for the prediction of OS for NSCLC patients. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. By making use of ad-hoc losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
翻译:人工智能(AI)最具挑战性的应用领域之一是肺癌研究,特别是非小细胞肺癌(NSCLC)。其中,总体生存期(OS)——即从诊断到死亡的时间——是患者状况的关键指标,有助于制定个性化治疗方案并提高OS率。在此分析中需考虑两个挑战:第一,现有研究鲜少能有效利用每位患者的全部可用信息,同时兼顾未删失(即死亡)和删失(即存活)患者,并考虑事件发生时间;第二,处理不完整数据是医学领域的常见问题,通常通过填充方法解决。我们的目标是提出一种能克服上述局限的AI模型,该模型可有效从删失与未删失患者及其可用特征中学习,用于预测NSCLC患者的OS。我们提出了一种针对含缺失值NSCLC生存分析的新方法,该方法利用Transformer架构的优势,仅依赖可用特征而无需任何填充策略。通过采用针对OS设计的专用损失函数,该模型既能同时处理删失与未删失患者,又能捕捉随时间变化的风险。我们将该方法与结合不同填充策略的生存分析前沿模型进行了对比,并在6年周期内采用不同时间粒度评估结果,在1个月、1年和2年的时间单位上分别获得了71.97、77.58和80.72的Ct指数(C指数的时间依赖性变体),无论使用何种填充方法,均优于所有前沿方法。