This paper develops a semiparametric Bayesian instrumental variable analysis method for estimating the causal effect of an endogenous variable when dealing with unobserved confounders and measurement errors with partly interval-censored time-to-event data, where event times are observed exactly for some subjects but left-censored, right-censored, or interval-censored for others. Our method is based on a two-stage Dirichlet process mixture instrumental variable (DPMIV) model which simultaneously models the first-stage random error term for the exposure variable and the second-stage random error term for the time-to-event outcome using a bivariate Gaussian mixture of the Dirichlet process (DPM) model. The DPM model can be broadly understood as a mixture model with an unspecified number of Gaussian components, which relaxes the normal error assumptions and allows the number of mixture components to be determined by the data. We develop an MCMC algorithm for the DPMIV model tailored for partly interval-censored data and conduct extensive simulations to assess the performance of our DPMIV method in comparison with some competing methods. Our simulations revealed that our proposed method is robust under different error distributions and can have superior performance over its parametric counterpart under various scenarios. We further demonstrate the effectiveness of our approach on an UK Biobank data to investigate the causal effect of systolic blood pressure on time-to-development of cardiovascular disease from the onset of diabetes mellitus.
翻译:本文针对存在未观测混杂因素和测量误差的部分区间删失时间-事件数据(其中部分受试者的事件时间被精确观测,而其他受试者的事件时间存在左删失、右删失或区间删失),提出了一种用于估计内生变量因果效应的半参数贝叶斯工具变量分析方法。我们的方法基于一个两阶段狄利克雷过程混合工具变量(DPMIV)模型,该模型利用狄利克雷过程(DPM)的双变量高斯混合模型,同时为暴露变量的第一阶段随机误差项和时间-事件结局的第二阶段随机误差项建模。DPM模型可广义地理解为具有未指定数量高斯分量的混合模型,它放宽了正态误差假设,并允许混合分量的数量由数据决定。我们为DPMIV模型开发了一种专门针对部分区间删失数据的MCMC算法,并通过大量模拟评估了我们的DPMIV方法与一些竞争方法相比的性能。我们的模拟结果表明,所提出的方法在不同误差分布下具有稳健性,并且在多种场景下可能优于其参数化对应方法。我们进一步在英国生物银行数据上展示了我们方法的有效性,以研究收缩压对从糖尿病发病到心血管疾病发生时间的因果效应。