Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation praovided by these algorithms enables transparency and explainability, which is necessary for decision making in critical real-world problems. Yet, causal ML has had limited impact in practice compared to associational ML. This paper investigates the challenges of causal ML with application to COVID-19 UK pandemic data. We collate data from various public sources and investigate what the various structure learning algorithms learn from these data. We explore the impact of different data formats on algorithms spanning different classes of learning, and assess the results produced by each algorithm, and groups of algorithms, in terms of graphical structure, model dimensionality, sensitivity analysis, confounding variables, predictive and interventional inference. We use these results to highlight open problems in causal structure learning and directions for future research. To facilitate future work, we make all graphs, models, data sets, and source code publicly available online.
翻译:因果机器学习(ML)算法能够恢复揭示因果关系的图形结构。这些算法提供的因果表征具有透明性和可解释性,这对于关键现实问题中的决策制定至关重要。然而,相较于关联性机器学习,因果机器学习在实际应用中的影响仍然有限。本文以英国COVID-19大流行数据为应用案例,系统研究了因果机器学习面临的挑战。我们汇集了多个公开来源的数据,探究不同结构学习算法从这些数据中获取的规律,分析了不同数据格式对各类学习算法的影响,并从图结构、模型维度、敏感性分析、混杂变量、预测性推理和干预性推理等维度评估了各算法及算法组的结果。基于这些发现,我们揭示了因果结构学习中的开放问题,并指明了未来研究方向。为促进后续研究,我们已将全部图、模型、数据集和源代码公开于网络平台。