One of the most challenging fields where Artificial Intelligence (AI) can be applied is lung cancer research, specifically non-small cell lung cancer (NSCLC). In particular, overall survival (OS), the time between diagnosis and death, is a vital indicator of patient status, enabling tailored treatment and improved OS rates. In this analysis, there are two challenges to take into account. First, few studies effectively exploit the information available from each patient, leveraging both uncensored (i.e., dead) and censored (i.e., survivors) patients, considering also the events' time. Second, the handling of incomplete data is a common issue in the medical field. This problem is typically tackled through the use of imputation methods. Our objective is to present an AI model able to overcome these limits, effectively learning from both censored and uncensored patients and their available features, for the prediction of OS for NSCLC patients. We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. By making use of ad-hoc losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used.
翻译:人工智能(AI)可应用的最具挑战性领域之一是肺癌研究,特别是非小细胞肺癌(NSCLC)。其中,总生存期(OS),即从诊断到死亡的时间,是患者状况的关键指标,有助于制定个性化治疗方案并提高OS率。在本分析中,需考虑两大挑战:首先,少有研究能有效利用每位患者的可用信息,同时兼顾删失(即存活)和非删失(即死亡)患者,并考虑事件发生时间;其次,处理不完整数据是医学领域的常见问题,通常通过插补方法解决。本文旨在提出一种能够克服这些局限的AI模型,有效从删失和非删失患者及其可用特征中学习,以预测NSCLC患者的OS。我们提出了一种针对NSCLC背景下缺失值生存分析的新方法,该方法利用Transformer架构的优势,仅考虑可用特征,无需任何插补策略。通过采用针对OS的专用损失函数,该模型能同时处理删失和非删失患者,以及随时间变化的风险。我们将所提方法与结合不同插补策略的生存分析前沿模型进行了比较。在为期6年的时间跨度内,采用不同时间粒度评估结果,分别以1个月、1年和2年为时间单位,获得了71.97、77.58和80.72的Ct指数(一种与时间相关的C指数变体),无论使用何种插补方法,均优于所有前沿方法。