Generalizing treatment effects with incomplete covariates: identifying assumptions and multiple imputation algorithms

We focus on the problem of generalizing a causal effect estimated on a randomized controlled trial (RCT) to a target population described by a set of covariates from observational data. Available methods such as inverse propensity sampling weighting are not designed to handle missing values, which are however common in both data sources. In addition to coupling the assumptions for causal effect identifiability and for the missing values mechanism and to defining appropriate estimation strategies, one difficulty is to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We propose three multiple imputation strategies to handle missing values when generalizing treatment effects, each handling the multi-source structure of the problem differently (separate imputation, joint imputation with fixed effect, joint imputation ignoring source information). As an alternative to multiple imputation, we also propose a direct estimation approach that treats incomplete covariates as semi-discrete variables. The multiple imputation strategies and the latter alternative rely on different sets of assumptions concerning the impact of missing values on identifiability. We discuss these assumptions and assess the methods through an extensive simulation study. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and an RCT studying the effect of tranexamic acid administration on mortality in major trauma patients admitted to ICU. The analysis illustrates how the missing values handling can impact the conclusion about the effect generalized from the RCT to the target population.

翻译：我们聚焦于将随机对照试验（RCT）中估计的因果效应泛化至由观测数据协变量所描述的目标总体这一问题。现有方法（如逆倾向抽样加权法）未设计用于处理缺失值，而缺失值在两个数据源中均普遍存在。除需将因果效应可识别性假设与缺失值机制假设相耦合、定义恰当的估计策略之外，另一难点在于需考虑双源数据的特殊结构（其中治疗与结局仅存在于RCT中）。我们提出三种多重插补策略以处理泛化治疗效应时的缺失值问题，每种策略对多源结构问题采取不同处理方式（分别插补、含固定效应的联合插补、忽略来源信息的联合插补）。作为多重插补的替代方案，我们还提出一种直接估计方法，将不完整协变量视为半离散变量。多重插补策略与后一替代方案依赖于关于缺失值对可识别性影响的不同假设集。我们讨论了这些假设，并通过广泛的模拟研究评估了各方法。本研究的动机源于对一项含超20,000名严重创伤患者的大型登记数据库的分析，以及一项关于氨甲环酸给药对ICU严重创伤患者死亡率影响的RCT。分析揭示了缺失值处理方法如何影响从RCT泛化至目标总体的效应结论。