Randomized Controlled Trials (RCTs) may suffer from limited scope. In particular, samples may be unrepresentative: some RCTs over- or under- sample individuals with certain characteristics compared to the target population, for which one wants conclusions on treatment effectiveness. Re-weighting trial individuals to match the target population can improve the treatment effect estimation. In this work, we establish the exact expressions of the bias and variance of such reweighting procedures -- also called Inverse Propensity of Sampling Weighting (IPSW) -- in presence of categorical covariates for any sample size. Such results allow us to compare the theoretical performance of different versions of IPSW estimates. Besides, our results show how the performance (bias, variance, and quadratic risk) of IPSW estimates depends on the two sample sizes (RCT and target population). A by-product of our work is the proof of consistency of IPSW estimates. Results also reveal that IPSW performances are improved when the trial probability to be treated is estimated (rather than using its oracle counterpart). In addition, we study choice of variables: how including covariates that are not necessary for identifiability of the causal effect may impact the asymptotic variance. Including covariates that are shifted between the two samples but not treatment effect modifiers increases the variance while non-shifted but treatment effect modifiers do not. We illustrate all the takeaways in a didactic example, and on a semi-synthetic simulation inspired from critical care medicine.
翻译:随机对照试验(RCT)可能存在范围受限的问题。具体而言,样本可能缺乏代表性:某些RCT相较于目标人群(研究者希望得出治疗效果结论的群体),会过度或不足地抽取具有特定特征的个体。对试验个体进行重加权以匹配目标人群,可改善治疗效果估计。本研究建立了分类协变量存在条件下任意样本量的重加权程序——亦称逆抽样倾向加权(IPSW)——的偏差与方差精确表达式。这些结果使我们能够比较不同版本IPSW估计的理论性能。此外,我们的结果揭示了IPSW估计性能(偏差、方差与二次风险)如何依赖于两个样本量(RCT样本与目标人群样本)。本研究的一个副产品是证明了IPSW估计的一致性。结果还表明,当采用估计的试验处理分配概率(而非其最优解)时,IPSW性能得到提升。针对变量选择问题,我们研究了纳入因果效应可识别性非必需协变量对渐近方差的影响:纳入两个样本间分布偏移但非治疗效应修饰因子的协变量会增大方差,而纳入无偏移的治疗效应修饰因子则不会。我们通过教学案例及基于重症监护医学的半合成仿真模拟,对全部结论进行了实例验证。