信息论视角下未测混杂的因果界限 (Information-Theoretic Causal Bounds under Unmeasured Confounding)

We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full structural causal model specifications; or focus solely on population-level averages while neglecting covariate-conditional treatment effects. We overcome all four limitations simultaneously by establishing novel information-theoretic, data-driven divergence bounds. Our key theoretical contribution shows that the f-divergence between the observational distribution P(Y | A = a, X = x) and the interventional distribution P(Y | do(A = a), X = x) is upper bounded by a function of the propensity score alone. This result enables sharp partial identification of conditional causal effects directly from observational data, without requiring external sensitivity parameters, auxiliary variables, full structural specifications, or outcome boundedness assumptions. For practical implementation, we develop a semiparametric estimator satisfying Neyman orthogonality (Chernozhukov et al., 2018), which ensures square-root-n consistent inference even when nuisance functions are estimated using flexible machine learning methods. Simulation studies and real-world data applications, implemented in the GitHub repository (https://github.com/yonghanjung/Information-Theretic-Bounds), demonstrate that our framework provides tight and valid causal bounds across a wide range of data-generating processes.

翻译：我们开发了一种数据驱动的信息论框架，用于在未测混杂条件下对因果效应进行尖锐的部分识别。现有方法通常依赖于限制性假设（如结果变量有界或离散）、需要外部输入（例如工具变量、代理变量或用户指定的敏感性参数）、要求完整的结构因果模型设定，或仅关注总体平均水平而忽略协变量条件处理效应。我们通过建立新颖的信息论数据驱动散度界限，同时克服了这四个局限性。我们的核心理论贡献表明，观测分布P(Y | A = a, X = x)与干预分布P(Y | do(A = a), X = x)之间的f-散度仅由倾向得分函数决定的上界约束。这一结果使得能够直接从观测数据中对条件因果效应进行尖锐的部分识别，无需外部敏感性参数、辅助变量、完整结构设定或结果有界性假设。在实际应用方面，我们开发了满足Neyman正交性（Chernozhukov等人，2018）的半参数估计器，即使在使用灵活的机器学习方法估计干扰函数时，也能保证√n一致推断。通过GitHub代码库（https://github.com/yonghanjung/Information-Theretic-Bounds）实现的模拟研究和实际数据应用表明，我们的框架能够在广泛的数据生成过程中提供紧致且有效的因果界限。