Nonparametric Identification and Inference for Counterfactual Distributions with Confounding

We propose nonparametric identification and semiparametric estimation of joint potential outcome distributions in the presence of confounding. First, in settings with observed confounding, we derive tighter, covariate-informed bounds on the joint distribution by leveraging conditional copulas. To overcome the non-differentiability of bounding min/max operators, we establish the asymptotic properties for both a direct estimator with polynomial margin condition and a smooth approximation with log-sum-exp operator, facilitating valid inference for individual-level effects under the canonical rank-preserving assumption. Second, we tackle the challenge of unmeasured confounding by introducing a causal representation learning framework. By utilizing instrumental variables, we prove the nonparametric identifiability of the latent confounding subspace under injectivity and completeness conditions. We develop a ``triple machine learning" estimator that employs cross-fitting scheme to sequentially handle the learned representation, nuisance parameters, and target functional. We characterize the asymptotic distribution with variance inflation induced by representation learning error, and provide conditions for semiparametric efficiency. We also propose a practical VAE-based algorithm for confounding representation learning. Simulations and real-world analysis validate the effectiveness of proposed methods. By bridging classical semiparametric theory with modern representation learning, this work provides a robust statistical foundation for distributional and counterfactual inference in complex causal systems.

翻译：本文提出了一种在存在混杂因素的情况下，联合潜在结果分布的非参数识别与半参数估计方法。首先，在可观测混杂因素设定下，我们通过利用条件 Copula 函数，推导出关于联合分布更紧的、协变量信息化的边界。为了克服边界 min/max 算子的不可微性，我们分别针对满足多项式边界条件的直接估计量和使用 log-sum-exp 算子的平滑近似，建立了其渐近性质，从而在经典的秩保持假设下，为个体层面效应的有效推断提供了支持。其次，我们通过引入一个因果表示学习框架来应对未观测混杂因素的挑战。通过利用工具变量，我们在单射性和完备性条件下证明了潜在混杂子空间的非参数可识别性。我们开发了一种"三重机器学习"估计量，该估计量采用交叉拟合方案来顺序处理学习到的表示、干扰参数和目标泛函。我们刻画了由表示学习误差引起的方差膨胀下的渐近分布，并给出了达到半参数有效性的条件。我们还提出了一种基于 VAE 的实用算法用于混杂表示学习。模拟实验和真实世界分析验证了所提方法的有效性。通过将经典半参数理论与现代表示学习相结合，本研究为复杂因果系统中的分布与反事实推断提供了坚实的统计学基础。