Probabilities of causation play a central role in modern decision making. Tian and Pearl first introduced formal definitions and derived tight bounds for three binary probabilities of causation, such as the probability of necessity and sufficiency (PNS). However, estimating these probabilities requires both experimental and observational distributions specific to each subpopulation, which are often unreliable or impractical to obtain from limited population-level data. To solve this problem, we propose two machine learning models: Exact-MLP and Mask-MLP, which are trained on a small set of reliable subpopulations and are able to predict PNS bounds for all other subpopulations. We validate our models across four Structural Causal Models (SCMs), each evaluated on population-level data with sample sizes between 100k and 200k. Our models achieve average mean absolute errors (MAEs) of roughly 0.03 on main tasks, reducing MAE by about 80% relative to the corresponding baselines. These results demonstrate both the feasibility of machine learning models for learning probabilities of causation and the effectiveness of the proposed approach.
翻译:因果概率在现代决策制定中扮演着核心角色。Tian与Pearl首次提出了三个二元因果概率(如必要充分性概率PNS)的形式化定义,并推导出其紧致边界。然而,估计这些概率需要针对每个子群体的实验性与观测性分布,而这些数据通常难以从有限总体层面数据中可靠或实际地获取。为解决此问题,我们提出了两种机器学习模型:Exact-MLP与Mask-MLP。这些模型在少量可靠子群体数据上进行训练,能够预测所有其他子群体的PNS边界。我们在四个结构因果模型(SCMs)上验证了所提模型,每个模型均在样本量介于10万至20万的总体层面数据上进行评估。我们的模型在主要任务上实现了约0.03的平均绝对误差(MAE),相较于相应基线方法降低了约80%的MAE。这些结果既证明了机器学习模型学习因果概率的可行性,也验证了所提方法的有效性。