Ensuring trustworthiness in machine learning (ML) models is a multi-dimensional task. In addition to the traditional notion of predictive performance, other notions such as privacy, fairness, robustness to distribution shift, adversarial robustness, interpretability, explainability, and uncertainty quantification are important considerations to evaluate and improve (if deficient). However, these sub-disciplines or 'pillars' of trustworthiness have largely developed independently, which has limited us from understanding their interactions in real-world ML pipelines. In this paper, focusing specifically on compositions of functions arising from the different pillars, we aim to reduce this gap, develop new insights for trustworthy ML, and answer questions such as the following. Does the composition of multiple fairness interventions result in a fairer model compared to a single intervention? How do bias mitigation algorithms for fairness affect local post-hoc explanations? Does a defense algorithm for untargeted adversarial attacks continue to be effective when composed with a privacy transformation? Toward this end, we report initial empirical results and new insights from 9 different compositions of functions (or pipelines) on 7 real-world datasets along two trustworthy dimensions - fairness and explainability. We also report progress, and implementation choices, on an extensible composer tool to encourage the combination of functionalities from multiple pillars. To-date, the tool supports bias mitigation algorithms for fairness and post-hoc explainability methods. We hope this line of work encourages the thoughtful consideration of multiple pillars when attempting to formulate and resolve a trustworthiness problem.
翻译:确保机器学习模型的可信度是一项多维度任务。除了传统的预测性能概念外,隐私性、公平性、对分布偏移的鲁棒性、对抗鲁棒性、可解释性、可说明性以及不确定性量化等维度,都是评估和改进(若存在缺陷)的重要考量。然而,这些可信度子领域或"支柱"在很大程度上是独立发展的,这限制了我们对其在实际机器学习管线中相互作用的理解。在本文中,我们聚焦于不同支柱所产生函数的组合,旨在缩小这一差距,为可信机器学习开发新洞见,并回答诸如以下问题:与单一干预相比,多种公平性干预的组合是否能产生更公平的模型?针对公平性的偏见缓解算法如何影响局部事后解释?针对非定向对抗攻击的防御算法在与隐私变换组合时是否仍然有效?为此,我们报告了在7个真实数据集上,沿公平性和可解释性这两个可信维度进行的9种不同函数组合(或管线)的初步实证结果和新洞见。我们还报告了一个可扩展组合器工具的进展及实现选择,以鼓励多个支柱功能的整合。截至目前,该工具支持用于公平性的偏见缓解算法和事后可解释性方法。我们希望这一系列工作能鼓励在尝试制定和解决可信度问题时审慎考虑多个支柱。