How fair are we? From conceptualization to automated assessment of fairness definitions

Fairness is a critical concept in ethics and social domains, but it is also a challenging property to engineer in software systems. With the increasing use of machine learning in software systems, researchers have been developing techniques to automatically assess the fairness of software systems. Nonetheless, a significant proportion of these techniques rely upon pre-established fairness definitions, metrics, and criteria, which may fail to encompass the wide-ranging needs and preferences of users and stakeholders. To overcome this limitation, we propose a novel approach, called MODNESS, that enables users to customize and define their fairness concepts using a dedicated modeling environment. Our approach guides the user through the definition of new fairness concepts also in emerging domains, and the specification and composition of metrics for its evaluation. Ultimately, MODNESS generates the source code to implement fair assessment based on these custom definitions. In addition, we elucidate the process we followed to collect and analyze relevant literature on fairness assessment in software engineering (SE). We compare MODNESS with the selected approaches and evaluate how they support the distinguishing features identified by our study. Our findings reveal that i) most of the current approaches do not support user-defined fairness concepts; ii) our approach can cover two additional application domains not addressed by currently available tools, i.e., mitigating bias in recommender systems for software engineering and Arduino software component recommendations; iii) MODNESS demonstrates the capability to overcome the limitations of the only two other Model-Driven Engineering-based approaches for fairness assessment.

翻译：公平性是伦理和社会领域的关键概念，但在软件系统中实现这一属性具有挑战性。随着机器学习在软件系统中的广泛应用，研究人员已开发出自动评估软件系统公平性的技术。然而，这些技术中很大一部分依赖于预先设定的公平性定义、指标和标准，可能无法涵盖用户和利益相关者的多样化需求与偏好。为克服这一局限，我们提出一种名为MODNESS的新方法，通过专用建模环境使用户能够定制和定义其公平性概念。该方法可引导用户在新兴领域定义新的公平性概念，并规范其评估指标的说明与组合。最终，MODNESS能基于这些自定义定义生成实现公平评估的源代码。此外，我们阐述了收集与分析软件工程（SE）领域公平性评估相关文献的过程，将MODNESS与精选方法进行了比较，并评估了它们对我们研究识别的关键特征的支持程度。研究发现：i) 当前大多数方法不支持用户自定义的公平性概念；ii) 我们的方法可覆盖现有工具尚未涉及的两个应用领域——即缓解软件工程推荐系统与Arduino软件组件推荐中的偏见；iii) MODNESS能够克服另外两种基于模型驱动工程的公平性评估方法存在的局限性。