Inverse Probability Weighting of Count Exposures in the Presence of Missing Data: A Simulation Study

Inverse probability of treatment weighting (IPTW) is widely used to estimate causal effects, but guidance is limited for count exposures. It is also unclear how IPTW performs when combined with multiple imputation in this context. In this study, we evaluated five IPTW methods applied to count exposures: multinomial binning, parametric and non-parametric covariate balancing propensity scores (CBPS, npCBPS), generalised boosted models (GBM), and energy balancing. Our simulations were informed by an example using data from the 1970 British Cohort Study, aiming to estimate the effect of psychological distress, measured as a count of symptoms at age 34, on self-reported longstanding illness at age 42. We compared these approaches on bias, coverage, effective sample size, and other metrics under truncated negative binomial and Poisson exposure distributions. We also assessed the performance of Rubin's rules under different missingness mechanisms. Under complete data, multinomial, CBPS, GBM, and energy weights produced low bias and near-nominal coverage, whereas npCBPS resulted in bias and poor coverage due to extreme weights. When data were missing completely at random, similar performance patterns were observed for IPTW with multiple imputation. Under missing at random, bias increased with higher missingness, but this was present for both IPTW and covariate-adjusted regression, possibly reflecting a limitation of the imputation model rather than a failure of IPTW. Overall, these findings support the use of multinomial, CBPS, GBMs, and energy weights for count exposures in similar settings while highlighting trade-offs between these methods and the need for imputation models accommodating right-truncated overdispersed counts.

翻译：逆概率治疗加权（IPTW）被广泛用于估计因果效应，但在计数暴露方面的指导有限。此外，在此背景下，IPTW与多重插补结合时的表现也不明确。在本研究中，我们评估了五种应用于计数暴露的IPTW方法：多项式分箱、参数和非参数协变量平衡倾向性评分（CBPS、npCBPS）、广义提升模型（GBM）以及能量平衡。我们的模拟基于使用1970年英国队列研究数据的一个实例，旨在估计心理困扰（以34岁时症状计数衡量）对42岁时自报长期疾病的影响。我们在截断负二项和泊松暴露分布下，就比较了这些方法在偏倚、覆盖度、有效样本量及其他指标上的表现。我们还评估了不同缺失机制下鲁宾规则的表现。在完整数据下，多项式、CBPS、GBM和能量权重产生了较低的偏倚和接近名义值的覆盖度，而npCBPS由于极端权重导致了偏倚和较差的覆盖度。当数据完全随机缺失时，IPTW与多重插补结合表现出相似的模式。在随机缺失情况下，偏倚随缺失率增加而增大，但这一现象同时存在于IPTW和协变量调整回归中，可能反映的是插补模型的局限性而非IPTW的失效。总体而言，这些结果支持在类似场景中将多项式、CBPS、GBM和能量权重用于计数暴露，同时强调了这些方法之间的权衡以及采用能够容纳右截断过分散计数的插补模型的必要性。