The adoption of machine learning in applications where it is crucial to ensure fairness and accountability has led to a large number of model proposals in the literature, largely formulated as optimisation problems with constraints reducing or eliminating the effect of sensitive attributes on the response. While this approach is very flexible from a theoretical perspective, the resulting models are somewhat black-box in nature: very little can be said about their statistical properties, what are the best practices in their applied use, and how they can be extended to problems other than those they were originally designed for. Furthermore, the estimation of each model requires a bespoke implementation involving an appropriate solver which is less than desirable from a software engineering perspective. In this paper, we describe the fairml R package which implements our previous work (Scutari, Panero, and Proissl 2022) and related models in the literature. fairml is designed around classical statistical models (generalised linear models) and penalised regression results (ridge regression) to produce fair models that are interpretable and whose properties are well-known. The constraint used to enforce fairness is orthogonal to model estimation, making it possible to mix-and-match the desired model family and fairness definition for each application. Furthermore, fairml provides facilities for model estimation, model selection and validation including diagnostic plots.
翻译:机器学习在确保公平性和问责性至关重要的应用中的采用,导致了文献中大量模型提案的出现,这些提案大多被表述为优化问题,通过约束条件减少或消除敏感属性对响应的影响。虽然从理论角度来看这种方法非常灵活,但由此产生的模型在本质上有些黑箱性质:关于其统计特性、应用实践中的最佳实践方法,以及如何将其扩展到最初设计问题之外的其他问题,我们知之甚少。此外,每个模型的估计都需要定制的实现方案,涉及合适的求解器,这从软件工程的角度来看并不理想。在本文中,我们描述了fairml R包,它实现了我们先前的工作(Scutari, Panero, and Proissl 2022)以及文献中的相关模型。fairml围绕经典统计模型(广义线性模型)和惩罚回归结果(岭回归)设计,以生成可解释且其属性众所周知的公平模型。用于执行公平性的约束与模型估计正交,使得可以为每个应用混合搭配所需的模型族和公平性定义。此外,fairml提供了模型估计、模型选择和验证的功能,包括诊断图。