Time-to-event analysis, or Survival analysis, provides valuable insights into clinical prognosis and treatment recommendations. However, this task is typically more challenging than other regression tasks due to the censored observations. Moreover, concerns regarding the reliability of predictions persist among clinicians, mainly attributed to the absence of confidence assessment, robustness, and calibration of prediction. To address those challenges, we introduce an evidential regression model designed especially for time-to-event prediction tasks, with which the most plausible event time, is directly quantified by aggregated Gaussian random fuzzy numbers (GRFNs). The GRFNs are a newly introduced family of random fuzzy subsets of the real line that generalizes both Gaussian random variables and Gaussian possibility distributions. Different from conventional methods that construct models based on strict data distribution, e.g., proportional hazard function, our model only assumes the event time is encoded in a real line GFRN without any strict distribution assumption, therefore offering more flexibility in complex data scenarios. Furthermore, the epistemic and aleatory uncertainty regarding the event time is quantified within the aggregated GRFN as well. Our model can, therefore, provide more detailed clinical decision-making guidance with two more degrees of information. The model is fit by minimizing a generalized negative log-likelihood function that accounts for data censoring based on uncertainty evidence reasoning. Experimental results on simulated datasets with varying data distributions and censoring scenarios, as well as on real-world datasets across diverse clinical settings and tasks, demonstrate that our model achieves both accurate and reliable performance, outperforming state-of-the-art methods.
翻译:时间-事件分析(亦称生存分析)为临床预后评估和治疗方案制定提供了重要依据。然而,由于存在删失观测数据,该任务通常比其他回归任务更具挑战性。此外,临床医生对预测结果的可靠性持续存在担忧,这主要源于预测结果缺乏置信度评估、鲁棒性及校准机制。为应对这些挑战,我们提出了一种专门针对时间-事件预测任务设计的证据性回归模型,该模型通过聚合高斯随机模糊数直接量化最可能的事件发生时间。高斯随机模糊数是新近提出的实线随机模糊子集族,其同时推广了高斯随机变量与高斯可能性分布。与传统基于严格数据分布假设(如比例风险函数)的建模方法不同,本模型仅假设事件时间编码于实线高斯随机模糊数中,无需任何严格分布假设,从而在复杂数据场景中具有更强的灵活性。此外,事件时间的认知不确定性与偶然不确定性也通过聚合高斯随机模糊数进行量化。因此,本模型能通过两个额外的信息维度为临床决策提供更精细的指导。该模型通过最小化基于不确定性证据推理的广义负对数似然函数进行拟合,该函数充分考虑了数据删失特性。在具有不同数据分布和删失场景的模拟数据集上,以及在多样化临床环境和任务的真实数据集上的实验结果表明,本模型在实现精确预测的同时保证了可靠性,其性能优于现有最先进方法。