Hybrid modeling integrates machine learning with scientific knowledge with the goal of enhancing interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing Double Machine Learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the $Q_{10}$ model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network (DNN) approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.
翻译:混合建模将机器学习与科学知识相结合,旨在增强可解释性、泛化能力以及对自然规律的遵循。然而,等效性和正则化偏差给混合建模实现这些目标带来了挑战。本文提出了一种通过因果推断框架估计混合模型的新方法,具体采用双机器学习(DML)来估计因果效应。我们针对与二氧化碳通量相关的两个问题展示了该方法在地球科学中的应用。在Q10模型中,我们证明了基于DML的混合建模在估计因果参数方面优于端到端深度神经网络(DNN)方法,展现出效率、对正则化方法偏差的稳健性,并规避了等效性。我们的方法应用于碳通量划分时,表现出适应异质因果效应的灵活性。该研究强调了明确定义因果图和因果关系的必要性,并将其倡导为通用最佳实践。我们鼓励在混合模型中继续探索因果关系,以在知识引导的机器学习中获得更可解释且可信的结果。