Electronic health records (EHR) are widely used to study clinical decisions, yet unmeasured confounding remains a persistent challenge. Proxy variables offer a potential solution. In EHR data, clinicians already record many such measurements (e.g., vitals), each revealing something about a patient's underlying health. Despite this, proxy-based methods are rarely used in practice. We introduce a new way to use proxies to adjust for unmeasured confounding. Our approach uses a vector of proxies to construct covariates that capture aspects of the unmeasured confounder, which are then included in a regression model. As one implementation, we use factor analysis followed by regression. We compare this approach with existing methods, including proximal causal inference, across a range of realistic settings. In practice, assumptions rarely hold exactly, so we study what happens when models are misspecified and variables are used incorrectly: e.g., a confounder or instrument is treated as a proxy. Finally, we apply the method to EHR data to estimate the effect of hospital admission for older adults presenting to the emergency department with chest pain, a setting where unmeasured confounding is a substantial concern. This work provides a practical way to use proxies and may help bring proxy-based methods into broader use.
翻译:电子健康档案(EHR)被广泛用于临床决策研究,但未测量混杂仍是一个持续存在的挑战。代理变量提供了一种潜在的解决方案。在EHR数据中,临床医生已记录了诸多此类测量值(如生命体征),每一项都能揭示患者潜在健康状况的某些信息。尽管如此,基于代理变量的方法在实践中却鲜少使用。我们提出了一种利用代理变量调整未测量混杂的新方法。该方法使用代理变量向量构建可捕捉未测量混杂因子某些方面的协变量,并将其纳入回归模型。作为一种具体实现,我们采用因子分析后接回归的技术。我们将此方法与现有方法(包括近端因果推断)在多种现实场景下进行比较。由于实际中假设条件极少完全成立,我们还研究了模型设定错误和变量使用不当(例如将混杂因子或工具变量误作代理变量)时的情况。最后,我们将该方法应用于EHR数据,以评估急诊胸痛老年患者住院治疗的效果——该场景中未测量混杂是一个重要考量因素。本研究为代理变量的实际应用提供了可行途径,或有助于推动基于代理变量的方法获得更广泛应用。