Discrimination performance in illness-death models with interval-censored disease data

from arxiv, Author order updated to match the published version (https://journals.sagepub.com/doi/10.1177/09622802251412855); preprint replaced with the accepted manuscript

In clinical studies, the illness-death model is often used to describe disease progression. A subject starts disease-free, may develop the disease and then die, or die directly. In clinical practice, disease can only be diagnosed at pre-specified follow-up visits, so the exact time of disease onset is often unknown, resulting in interval-censored data. This study examines the impact of ignoring this interval-censored nature of disease data on the discrimination performance of illness-death models, focusing on the time-specific Area Under the receiver operating characteristic Curve (AUC) in both incident/dynamic and cumulative/dynamic definitions. A simulation study with data simulated from Weibull transition hazards and disease state censored at regular intervals is conducted. Estimates are derived using different methods: the Cox model with a time-dependent binary disease marker, which ignores interval-censoring, and the illness-death model for interval-censored data estimated with three implementations - the piecewise-constant model from the msm package, the Weibull and M-spline models from the SmoothHazard package. These methods are also applied to a dataset of 2232 patients with high-grade soft tissue sarcoma, where the interval-censored disease state is the post-operative development of distant metastases. The results suggest that, in the presence of interval-censored disease times, it is important to account for interval-censoring not only when estimating the parameters of the model but also when evaluating the discrimination performance of the disease.

翻译：在临床研究中，疾病-死亡模型常用于描述疾病进展过程。受试者初始处于无病状态，随后可能发病并最终死亡，也可能直接死亡。在临床实践中，疾病仅能在预设的随访时间点被诊断，因此疾病发生的准确时间往往未知，从而产生区间删失数据。本研究探讨了忽略疾病数据这种区间删失特性对疾病-死亡模型判别性能的影响，重点关注事件/动态和累积/动态两种定义下的时间特异性受试者工作特征曲线下面积。通过模拟研究进行分析，数据基于威布尔转移风险生成，疾病状态按固定时间间隔进行删失。采用不同方法进行估计：忽略区间删失的含时变二元疾病标志物的Cox模型，以及针对区间删失数据的疾病-死亡模型（通过三种实现方式估计——msm软件包的分段常数模型，SmoothHazard软件包的威布尔模型和M样条模型）。这些方法同时应用于包含2232例高级别软组织肉瘤患者的数据集，其中区间删失的疾病状态为术后远处转移的发生。结果表明，当存在区间删失的疾病时间时，不仅在估计模型参数时需要考虑区间删失，在评估疾病判别性能时同样需要予以考虑。