Survival analysis is a crucial semi-supervised task in machine learning with numerous real-world applications, particularly in healthcare. Currently, the most common approach to survival analysis is based on Cox's partial likelihood, which can be interpreted as a ranking model optimized on a lower bound of the concordance index. This relation between ranking models and Cox's partial likelihood considers only pairwise comparisons. Recent work has developed differentiable sorting methods which relax this pairwise independence assumption, enabling the ranking of sets of samples. However, current differentiable sorting methods cannot account for censoring, a key factor in many real-world datasets. To address this limitation, we propose a novel method called Diffsurv. We extend differentiable sorting methods to handle censored tasks by predicting matrices of possible permutations that take into account the label uncertainty introduced by censored samples. We contrast this approach with methods derived from partial likelihood and ranking losses. Our experiments show that Diffsurv outperforms established baselines in various simulated and real-world risk prediction scenarios. Additionally, we demonstrate the benefits of the algorithmic supervision enabled by Diffsurv by presenting a novel method for top-k risk prediction that outperforms current methods.
翻译:生存分析是机器学习中一项至关重要的半监督任务,在众多实际应用中(尤其是医疗领域)具有广泛价值。当前最主流的生存分析方法基于Cox偏似然函数,该函数可解释为一种在一致性指数下界上优化的排序模型。这种排序模型与Cox偏似然的关联仅考虑了成对比较。近期研究开发了可微分排序方法,突破了成对独立性假设,实现了样本集合的整体排序。然而,现有可微分排序方法无法处理删失数据——这是许多真实数据集中关键因素。为解决这一局限,我们提出名为Diffsurv的新方法。通过预测能够体现删失样本标签不确定性的可能排列矩阵,我们将可微分排序方法扩展至删失任务。我们将该方法与基于偏似然及排序损失的方法进行对比分析。实验表明,在多种模拟及真实风险预测场景中,Diffsurv超越了现有基线方法。此外,我们通过提出一种超越现有方法的前k风险预测新方法,展示了Diffsurv所支持的算法监督优势。