fairml: A Statistician's Take on Fair Machine Learning Modelling

The adoption of machine learning in applications where it is crucial to ensure fairness and accountability has led to a large number of model proposals in the literature, largely formulated as optimisation problems with constraints reducing or eliminating the effect of sensitive attributes on the response. While this approach is very flexible from a theoretical perspective, the resulting models are somewhat black-box in nature: very little can be said about their statistical properties, what are the best practices in their applied use, and how they can be extended to problems other than those they were originally designed for. Furthermore, the estimation of each model requires a bespoke implementation involving an appropriate solver which is less than desirable from a software engineering perspective. In this paper, we describe the fairml R package which implements our previous work (Scutari, Panero, and Proissl 2022) and related models in the literature. fairml is designed around classical statistical models (generalised linear models) and penalised regression results (ridge regression) to produce fair models that are interpretable and whose properties are well-known. The constraint used to enforce fairness is orthogonal to model estimation, making it possible to mix-and-match the desired model family and fairness definition for each application. Furthermore, fairml provides facilities for model estimation, model selection and validation including diagnostic plots.

翻译：机器学习在确保公平性和问责性至关重要的应用中的采用，导致了文献中大量模型提案的出现，这些提案大多被表述为优化问题，通过约束条件减少或消除敏感属性对响应的影响。虽然从理论角度来看这种方法非常灵活，但由此产生的模型在本质上有些黑箱性质：关于其统计特性、应用实践中的最佳实践方法，以及如何将其扩展到最初设计问题之外的其他问题，我们知之甚少。此外，每个模型的估计都需要定制的实现方案，涉及合适的求解器，这从软件工程的角度来看并不理想。在本文中，我们描述了fairml R包，它实现了我们先前的工作（Scutari, Panero, and Proissl 2022）以及文献中的相关模型。fairml围绕经典统计模型（广义线性模型）和惩罚回归结果（岭回归）设计，以生成可解释且其属性众所周知的公平模型。用于执行公平性的约束与模型估计正交，使得可以为每个应用混合搭配所需的模型族和公平性定义。此外，fairml提供了模型估计、模型选择和验证的功能，包括诊断图。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

【图机器学习进展与趋势@ICML2022】Graph Machine Learning @ ICML 2022

专知会员服务

40+阅读 · 2022年7月25日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日