Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.
翻译:空间混杂对涉及空间数据的科学研究构成重大挑战,其中未观测到的空间变量可能既影响处理变量又影响结果变量,从而导致虚假关联。为解决这一问题,我们引入SpaCE:空间混杂环境,这是首个提供真实基准数据集和工具的系统性评估工具包,旨在缓解空间混杂的因果推断方法。每个数据集包含训练数据、真实反事实值、带有坐标的空间图,以及表征缺失空间混杂变量影响的光滑度与混杂评分。此外,数据集还包含基于最先进机器学习集成生成的现实半合成结果与反事实值,遵循因果推断基准的最佳实践。数据集涵盖来自气候、健康和社会科学等不同领域的真实处理变量和协变量。SpaCE支持自动化的端到端流程,简化数据加载、实验设置以及机器学习与因果推断模型的评估。SpaCE项目提供数十个不同规模和空间复杂度的数据集,并以Python包形式公开发布,鼓励社区反馈与贡献。