Reconciling Predictive Multiplicity in Practice

Many machine learning applications predict individual probabilities, such as the likelihood that a person develops a particular illness. Since these probabilities are unknown, a key question is how to address situations in which different models trained on the same dataset produce varying predictions for certain individuals. This issue is exemplified by the model multiplicity (MM) phenomenon, where a set of comparable models yield inconsistent predictions. Roth, Tolbert, and Weinstein recently introduced a reconciliation procedure, the Reconcile algorithm, to address this problem. Given two disagreeing models, the algorithm leverages their disagreement to falsify and improve at least one of the models. In this paper, we empirically analyze the Reconcile algorithm using five widely-used fairness datasets: COMPAS, Communities and Crime, Adult, Statlog (German Credit Data), and the ACS Dataset. We examine how Reconcile fits within the model multiplicity literature and compare it to existing MM solutions, demonstrating its effectiveness. We also discuss potential improvements to the Reconcile algorithm theoretically and practically. Finally, we extend the Reconcile algorithm to the setting of causal inference, given that different competing estimators can again disagree on specific causal average treatment effect (CATE) values. We present the first extension of the Reconcile algorithm in causal inference, analyze its theoretical properties, and conduct empirical tests. Our results confirm the practical effectiveness of Reconcile and its applicability across various domains.

翻译：许多机器学习应用预测个体概率，例如某人罹患特定疾病的可能性。由于这些概率未知，一个关键问题是如何处理不同模型在同一数据集上训练后对某些个体产生不同预测的情况。这一问题以模型多样性现象为例，即一组可比较模型产生不一致的预测。Roth、Tolbert和Weinstein近期提出了一种调和程序——Reconcile算法——来解决此问题。给定两个存在分歧的模型，该算法利用它们之间的分歧来证伪并改进至少其中一个模型。本文使用五个广泛使用的公平性数据集：COMPAS、Communities and Crime、Adult、Statlog（德国信用数据）和ACS数据集，对Reconcile算法进行实证分析。我们探讨了Reconcile在模型多样性文献中的定位，并将其与现有MM解决方案进行比较，证明了其有效性。我们还从理论和实践角度讨论了Reconcile算法的潜在改进方向。最后，鉴于不同竞争性估计器可能再次在特定因果平均处理效应值上存在分歧，我们将Reconcile算法扩展到因果推断场景。我们提出了Reconcile算法在因果推断中的首次扩展，分析了其理论性质，并进行了实证检验。我们的结果证实了Reconcile的实际有效性及其跨领域的适用性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/