A directed acyclic graph (DAG) provides valuable prior knowledge that is often discarded in regression tasks in machine learning. We show that the independences arising from the presence of collider structures in DAGs provide meaningful inductive biases, which constrain the regression hypothesis space and improve predictive performance. We introduce collider regression, a framework to incorporate probabilistic causal knowledge from a collider in a regression problem. When the hypothesis space is a reproducing kernel Hilbert space, we prove a strictly positive generalisation benefit under mild assumptions and provide closed-form estimators of the empirical risk minimiser. Experiments on synthetic and climate model data demonstrate performance gains of the proposed methodology.
翻译:有向无环图(DAG)提供的宝贵先验知识在机器学习回归任务中常被丢弃。我们证明了DAG中碰撞结构产生的独立性能够提供有意义的归纳偏置,从而约束回归假设空间并提升预测性能。我们提出碰撞回归(collider regression)框架,该框架将碰撞结构的概率因果知识融入回归问题。当假设空间为再生核希尔伯特空间时,我们在温和假设下证明了严格正的泛化收益,并给出了经验风险最小化器的闭式估计量。在合成数据和气候模型数据上的实验证明了所提方法的性能提升。