DAG(Directed Acyclic Graph) from causal inference does not differentiate causal effects and correlated changes. And the general effect of a population is usually approximated by averaging correlations over all individuals. Since AI(Artificial Intelligence) enables large-scale structure modeling on big data, the complex hidden confoundings have made these approximation errors no longer ignorable but snowballed to considerable modeling bias - Such Causal Representation Bias (CRB) leads to many problems: ungeneralizable causal models, unrevealed individual-level features, hardly utilized causal knowledge in DL(Deep Learning), etc. In short, DAG must be redefined to enable a new framework for causal AI. The observational time series in statistics can only represent correlated changes, while the DL-based autoencoder can represent them as individualized feature changes in latent space to estimate the causal effects directly. In this paper, we introduce the redefined do-DAG to visualize CRB, propose a generic solution Causal Representation Learning (CRL) framework, along with a novel architecture for its realization, and experimentally verify the feasibility.
翻译:因果推断中的DAG(有向无环图)未能区分因果效应与相关变化,且群体总体效应通常通过对所有个体的相关性取平均来近似。由于人工智能(AI)能够在大数据上进行大规模结构建模,复杂隐藏混杂因素使得这些近似误差不再可忽略,反而累积为显著的建模偏差——这种因果表示偏差(CRB)导致诸多问题:不可泛化的因果模型、未被揭示的个体级特征、深度学习(DL)中难以利用的因果知识等。简言之,必须重新定义DAG以建立因果AI的新框架。统计学中的观测时间序列仅能表示相关变化,而基于深度学习的自编码器可将其表示为隐空间中的个体化特征变化,从而直接估计因果效应。本文引入重新定义的do-DAG以可视化CRB,提出通用解决方案——因果表示学习(CRL)框架及其实现的新型架构,并通过实验验证其可行性。