Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.
翻译:空间混杂对涉及空间数据的科学研究构成了重大挑战,未观测到的空间变量可能同时影响处理和结果,从而导致虚假关联。为解决这一问题,我们提出了SpaCE:空间混杂环境,这是首个提供真实基准数据集和工具的工具包,用于系统评估旨在缓解空间混杂的因果推断方法。每个数据集包含训练数据、真实反事实、带坐标的空间图,以及表征缺失空间混杂因子影响的平滑度与混杂分数。数据集还包含采用先进机器学习集成方法生成的现实半合成结果与反事实,遵循因果推断基准的最佳实践。这些数据集涵盖气候、健康和社会科学等多个领域的真实处理变量与协变量。SpaCE提供自动化的端到端流程,简化了数据加载、实验设置以及机器学习与因果推断模型的评估。SpaCE项目提供数十个不同规模和空间复杂度的数据集,并以Python包形式公开,鼓励社区反馈与贡献。