Modeling symptom progression to identify informative subjects for a new Huntington's disease clinical trial is problematic since time to diagnosis, a key covariate, can be heavily censored. Imputation is an appealing strategy where censored covariates are replaced with their conditional means, but existing methods saw over 200% bias under heavy censoring. Calculating these conditional means well requires estimating and then integrating over the survival function of the censored covariate from the censored value to infinity. To estimate the survival function flexibly, existing methods use the semiparametric Cox model with Breslow's estimator, leaving the integrand for the conditional means (the estimated survival function) undefined beyond the observed data. The integral is then estimated up to the largest observed covariate value, and this approximation can cut off the tail of the survival function and lead to severe bias, particularly under heavy censoring. We propose a hybrid approach that splices together the semiparametric survival estimator with a parametric extension, making it possible to approximate the integral up to infinity. In simulation studies, our proposed approach of extrapolation then imputation substantially reduces the bias seen with existing imputation methods, even when the parametric extension was misspecified. We further demonstrate how imputing with corrected conditional means helps to prioritize patients for future clinical trials.
翻译:建模症状进展以识别亨廷顿舞蹈症新临床试验中的信息受试者存在困难,因为关键协变量——诊断时间可能受到严重审查。插补是一种有吸引力的策略,可将审查协变量替换为其条件均值,但现有方法在强审查下偏差超过200%。准确计算这些条件均值需要估计审查协变量的生存函数,并从审查值到无穷大进行积分。为灵活估计生存函数,现有方法使用半参数Cox模型结合Breslow估计量,导致条件均值的被积函数(估计的生存函数)在观测数据之外未定义。随后积分仅估计到最大观测协变量值,这种近似可能截断生存函数尾部,特别是在强审查下导致严重偏差。我们提出一种混合方法,将半参数生存估计与参数扩展拼接,使得近似积分到无穷大成为可能。在模拟研究中,我们提出的外推后插补方法显著减少了现有插补方法的偏差,即使参数扩展被错误设定时也如此。我们进一步展示了修正条件均值的插补如何有助于优先排序未来临床试验的患者。