The fairness of machine learning (ML) approaches is critical to the reliability of modern artificial intelligence systems. Despite extensive study on this topic, the fairness of ML models in the software engineering (SE) domain has not been well explored yet. As a result, many ML-powered software systems, particularly those utilized in the software engineering community, continue to be prone to fairness issues. Taking one of the typical SE tasks, i.e., code reviewer recommendation, as a subject, this paper conducts the first study toward investigating the issue of fairness of ML applications in the SE domain. Our empirical study demonstrates that current state-of-the-art ML-based code reviewer recommendation techniques exhibit unfairness and discriminating behaviors. Specifically, male reviewers get on average 7.25% more recommendations than female code reviewers compared to their distribution in the reviewer set. This paper also discusses the reasons why the studied ML-based code reviewer recommendation systems are unfair and provides solutions to mitigate the unfairness. Our study further indicates that the existing mitigation methods can enhance fairness by 100% in projects with a similar distribution of protected and privileged groups, but their effectiveness in improving fairness on imbalanced or skewed data is limited. Eventually, we suggest a solution to overcome the drawbacks of existing mitigation techniques and tackle bias in datasets that are imbalanced or skewed.
翻译:机器学习方法的公平性对现代人工智能系统的可靠性至关重要。尽管该主题已有广泛研究,但软件工程领域中机器学习模型的公平性问题尚未得到充分探索。因此,许多基于机器学习的软件系统(尤其是软件工程社区中使用的系统)仍易出现公平性问题。本文以典型的软件工程任务——代码审查者推荐为研究对象,首次针对软件工程领域中机器学习应用的公平性问题展开调查。我们的实证研究表明,当前最先进的基于机器学习的代码审查者推荐技术存在不公平和歧视性行为。具体而言,男性审查者获得的推荐比例平均比女性审查者高出7.25%(相较于其在审查者集合中的分布)。本文还探讨了所研究的基于机器学习的代码审查者推荐系统不公平的原因,并提供了缓解不公平性的解决方案。进一步研究表明,在受保护组与特权组分布相似的项目中,现有缓解方法可将公平性提升100%,但在处理不平衡或偏斜数据时,其提升公平性的效果有限。最终,我们提出了一种解决方案,以克服现有缓解技术的缺陷并解决不平衡或偏斜数据集中的偏差问题。