Causality lays the foundation for the trajectory of our world. Causal inference (CI), which aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial research topic. Nevertheless, the lack of observation of important variables (e.g., confounders, mediators, exogenous variables, etc.) severely compromises the reliability of CI methods. The issue may arise from the inherent difficulty in measuring the variables. Additionally, in observational studies where variables are passively recorded, certain covariates might be inadvertently omitted by the experimenter. Depending on the type of unobserved variables and the specific CI task, various consequences can be incurred if these latent variables are carelessly handled, such as biased estimation of causal effects, incomplete understanding of causal mechanisms, lack of individual-level causal consideration, etc. In this survey, we provide a comprehensive review of recent developments in CI with latent variables. We start by discussing traditional CI techniques when variables of interest are assumed to be fully observed. Afterward, under the taxonomy of circumvention and inference-based methods, we provide an in-depth discussion of various CI strategies to handle latent variables, covering the tasks of causal effect estimation, mediation analysis, counterfactual reasoning, and causal discovery. Furthermore, we generalize the discussion to graph data where interference among units may exist. Finally, we offer fresh aspects for further advancement of CI with latent variables, especially new opportunities in the era of large language models (LLMs).
翻译:因果关系构成了世界运行轨迹的基础。因果推断旨在揭示感兴趣变量之间的内在因果联系,已成为关键的研究领域。然而,重要变量(如混杂因子、中介变量、外生变量等)的观测缺失严重影响了因果推断方法的可靠性。这一问题可能源于变量测量的固有难度。此外,在被动记录变量的观察性研究中,实验者可能无意中遗漏某些协变量。根据未观测变量的类型和具体因果推断任务,若对这些潜变量处理不当可能引发多种后果,例如因果效应估计偏差、因果机制理解不完整、个体层面因果考量缺失等。本综述系统回顾了潜变量下因果推断的最新进展。我们首先讨论在假设感兴趣变量完全可观测时的传统因果推断技术。随后,基于规避型与推断型方法的分类框架,深入探讨了处理潜变量的多种因果推断策略,涵盖因果效应估计、中介分析、反事实推理和因果发现等任务。此外,我们将讨论拓展至存在单元间相互作用的图数据场景。最后,针对潜变量下因果推断的进一步发展提出新的研究方向,特别关注大语言模型时代带来的新机遇。