Interval-censoring frequently occurs in studies of chronic diseases where disease status is inferred from intermittently collected biomarkers. Although many methods have been developed to analyze such data, they typically assume perfect disease diagnosis, which often does not hold in practice due to the inherent imperfect clinical diagnosis of cognitive functions or measurement errors of biomarkers such as cerebrospinal fluid. In this work, we introduce a semiparametric modeling framework using the Cox proportional hazards model to address interval-censored data in the presence of inaccurate disease diagnosis. Our model incorporates sensitivity and specificity of the diagnosis to account for uncertainty in whether the interval truly contains the disease onset. Furthermore, the framework accommodates scenarios involving a terminal event and when diagnosis is accurate, such as through postmortem analysis. We propose a nonparametric maximum likelihood estimation method for inference and develop an efficient EM algorithm to ensure computational feasibility. The regression coefficient estimators are shown to be asymptotically normal, achieving semiparametric efficiency bounds. We further validate our approach through extensive simulation studies and an application assessing Alzheimer's disease (AD) risk. We find that amyloid-beta is significantly associated with AD, but Tau is predictive of both AD and mortality.
翻译:区间删失在慢性疾病研究中经常出现,其中疾病状态需要通过间歇性收集的生物标志物进行推断。尽管已开发出许多方法来分析此类数据,但这些方法通常假设疾病诊断是完美的,而由于认知功能临床诊断固有的不完美性或生物标志物(如脑脊液)的测量误差,这一假设在实践中往往不成立。在本研究中,我们引入了一个基于Cox比例风险模型的半参数建模框架,以处理存在不准确疾病诊断时的区间删失数据。我们的模型纳入了诊断的敏感性和特异性,以考虑区间是否真正包含疾病发病时间的不确定性。此外,该框架还适用于涉及终止事件以及诊断准确(例如通过尸检分析)的情况。我们提出了一种用于推断的非参数最大似然估计方法,并开发了一种高效的EM算法以确保计算可行性。回归系数估计量被证明是渐近正态的,达到了半参数效率界。我们通过广泛的模拟研究和一项评估阿尔茨海默病(AD)风险的应用进一步验证了我们的方法。我们发现β-淀粉样蛋白与AD显著相关,而Tau蛋白对AD和死亡率均具有预测性。