Causal graphs are usually considered in a 2D plane, but it has rarely been noticed that within multiple relatively independent timelines, which is comparatively common in causality machine learning, the individual-level differences may lead to Causal Representation Bias (CRB). More importantly, such a blind spot has brought obstacles to interdisciplinary applications. Deep Learning (DL) methods overlooking CRBs confront the trouble of models' generalizability, while statistical analytics face difficulties in modeling individual-level features without a geometric global view. In this paper, we initially discuss the Geometric Meaning of causal graphs regarding multi-dimensional timelines; and, accordingly, analyze the scheme of CRB and explicitly define causal model generalization and individualization from a geometric perspective. We also spearhead a novel framework, Causal Representation Learning (CRL), to construct a valid learning plane (in latent space) for causal graphs, propose a particular autoencoder architecture to realize it, and experimentally prove the feasibility. Involved causal data includes Electronic Healthcare Records (EHR) to estimate medical effects and a hydrology dataset to forecast the environmentally influenced streamflow.
翻译:因果图通常被考虑在二维平面中,但很少有人注意到,在因果机器学习中较为常见的多个相对独立的时间线内,个体层面的差异可能导致因果表示偏差。更重要的是,这一盲区为跨学科应用带来了障碍。忽视CRB的深度学习方法面临模型泛化能力的问题,而统计分析方法缺乏几何全局视角,难以对个体层面特征进行建模。本文首先讨论因果图在多维时间线下的几何意义,并据此分析CRB机制,从几何视角明确定义因果模型的泛化与个性化。我们率先提出因果表示学习这一新框架,用于在潜在空间中为因果图构建有效的学习平面,提出特定的自动编码器架构实现该框架,并通过实验证明其可行性。涉及的因果数据包括用于评估医疗效果的电子健康记录和用于预测环境影响的径流数据集。