Survival analysis is the branch of statistics that studies the relation between the characteristics of living entities and their respective survival times, taking into account the partial information held by censored cases. A good analysis can, for example, determine whether one medical treatment for a group of patients is better than another. With the rise of machine learning, survival analysis can be modeled as learning a function that maps studied patients to their survival times. To succeed with that, there are three crucial issues to be tackled. First, some patient data is censored: we do not know the true survival times for all patients. Second, data is scarce, which led past research to treat different illness types as domains in a multi-task setup. Third, there is the need for adaptation to new or extremely rare illness types, where little or no labels are available. In contrast to previous multi-task setups, we want to investigate how to efficiently adapt to a new survival target domain from multiple survival source domains. For this, we introduce a new survival metric and the corresponding discrepancy measure between survival distributions. These allow us to define domain adaptation for survival analysis while incorporating censored data, which would otherwise have to be dropped. Our experiments on two cancer data sets reveal a superb performance on target domains, a better treatment recommendation, and a weight matrix with a plausible explanation.
翻译:生存分析是统计学的一个分支,研究生物体特征与其各自生存时间之间的关系,同时考虑被删失数据所包含的部分信息。良好的分析能够,例如,判断某组患者的某种医疗方案是否优于另一种方案。随着机器学习的兴起,生存分析可被建模为学习一个将研究患者映射至其生存时间的函数。要成功实现这一点,需解决三个关键问题:首先,部分患者数据存在删失——我们无法获知所有患者的真实生存时间;其次,数据稀缺,这促使以往研究将不同疾病类型视为多任务设置下的域;第三,需要适应全新或极为罕见的疾病类型,此类情况下标注数据极少甚至为零。与以往的多任务设置不同,本研究旨在探究如何从多个生存源域高效适应至新的生存目标域。为此,我们提出了一种新的生存度量指标,以及相应的生存分布间差异度量。这些指标使我们能够在纳入删失数据(否则需被剔除)的前提下,定义生存分析的域适应方法。我们在两个癌症数据集上的实验表明,该方法在目标域上表现优异,能提供更优的治疗方案推荐,并生成具有可解释性的权重矩阵。