Causal graphs are usually considered in a 2D plane, but it has rarely been noticed that within multiple relatively independent timelines, which is comparatively common in causality machine learning, the individual-level differences may lead to Causal Representation Bias (CRB). More importantly, such a blind spot has brought obstacles to interdisciplinary applications. Deep Learning (DL) methods overlooking CRBs confront the trouble of models' generalizability, while statistical analytics face difficulties in modeling individual-level features without a geometric global view. In this paper, we initially discuss the Geometric Meaning of causal graphs regarding multi-dimensional timelines; and, accordingly, analyze the scheme of CRB and explicitly define causal model generalization and individualization from a geometric perspective. We also spearhead a novel framework, Causal Representation Learning (CRL), to construct a valid learning plane (in latent space) for causal graphs, propose a particular autoencoder architecture to realize it, and experimentally prove the feasibility. Involved causal data includes Electronic Healthcare Records (EHR) to estimate medical effects and a hydrology dataset to forecast the environmentally influenced streamflow.
翻译:因果图通常被置于二维平面中,但鲜少注意到,在因果机器学习中较为常见的多个相对独立时间线内,个体层面的差异可能导致因果表示偏差。更重要的是,这一盲点给跨学科应用带来了障碍。忽视因果表示偏差的深度学习方法面临模型泛化能力的问题,而统计分析方法则因缺乏几何全局视角而难以建模个体特征。本文首先探讨了因果图在多维时间线场景下的几何含义,据此分析因果表示偏差的机制,并从几何角度明确定义了因果模型的泛化与个性化。我们还率先提出一种名为因果表示学习的新框架,用于在潜在空间中为因果图构建有效的学习平面,为此设计了特定的自编码器架构,并通过实验验证了其可行性。涉及的因果数据包括用于评估医疗效果的电子健康记录,以及用于预测受环境影响的水文数据集。