Causal graphs are usually considered in a 2D plane, but it has rarely been noticed that within multiple relatively independent timelines, which is comparatively common in causality machine learning, the individual-level differences may lead to Causal Representation Bias (CRB). More importantly, such a blind spot has brought obstacles to interdisciplinary applications. Deep Learning (DL) methods overlooking CRBs confront the trouble of models' generalizability, while statistical analytics face difficulties in modeling individual-level features without a geometric global view. In this paper, we initially discuss the Geometric Meaning of causal graphs regarding multi-dimensional timelines; and, accordingly, analyze the scheme of CRB and explicitly define causal model generalization and individualization from a geometric perspective. We also spearhead a novel framework, Causal Representation Learning (CRL), to construct a valid learning plane (in latent space) for causal graphs, propose a particular autoencoder architecture to realize it, and experimentally prove the feasibility. Involved causal data includes Electronic Healthcare Records (EHR) to estimate medical effects and a hydrology dataset to forecast the environmentally influenced streamflow.
翻译:因果图通常被置于二维平面中考虑,但鲜有研究注意到,在因果机器学习中较为普遍的多重相对独立时间线内,个体层面的差异可能导致因果表示偏差(Causal Representation Bias, CRB)。更关键的是,这一盲区为跨学科应用带来了阻碍。忽略CRB的深度学习(Deep Learning, DL)方法面临模型泛化能力问题,而统计学分析则因缺乏几何全局视角难以构建个体层面特征。本文首先探讨多维时间线下因果图的几何意义,据此分析CRB的机制,并从几何视角明确定义因果模型的泛化与个性化。我们率先提出新型框架——因果表示学习(Causal Representation Learning, CRL),为因果图在潜在空间中构建有效的学习平面,设计特定的自编码器架构加以实现,并通过实验验证其可行性。涉及的因果数据包括用于评估医疗效果的电子健康记录(Electronic Healthcare Records, EHR)以及用于预测受环境影响的流量序列的水文学数据集。