The use of automated decision tools in recruitment has received an increasing amount of attention. In November 2021, the New York City Council passed a legislation (Local Law 144) that mandates bias audits of Automated Employment Decision Tools. From 15th April 2023, companies that use automated tools for hiring or promoting employees are required to have these systems audited by an independent entity. Auditors are asked to compute bias metrics that compare outcomes for different groups, based on sex/gender and race/ethnicity categories at a minimum. Local Law 144 proposes novel bias metrics for regression tasks (scenarios where the automated system scores candidates with a continuous range of values). A previous version of the legislation proposed a bias metric that compared the mean scores of different groups. The new revised bias metric compares the proportion of candidates in each group that falls above the median. In this paper, we argue that both metrics fail to capture distributional differences over the whole domain, and therefore cannot reliably detect bias. We first introduce two metrics, as possible alternatives to the legislation metrics. We then compare these metrics over a range of theoretical examples, for which the legislation proposed metrics seem to underestimate bias. Finally, we study real data and show that the legislation metrics can similarly fail in a real-world recruitment application.
翻译:自动决策工具在招聘中的使用日益受到关注。2021年11月,纽约市议会通过了一项立法(地方法144号),要求对自动就业决策工具进行偏见审计。自2023年4月15日起,使用自动化工具招聘或晋升员工的公司必须由独立机构对这些系统进行审计。审计人员需计算偏见指标,至少基于性别/性别认同以及种族/民族类别,比较不同群体的结果。地方法144号针对回归任务(自动化系统对候选人进行连续分数评分的场景)提出了新颖的偏见指标。该立法的早期版本提出了一种比较不同群体平均分数的偏见指标。修订后的新偏见指标则比较了各群体中分数高于中位数的候选人比例。本文指出,这两个指标均未能捕捉整个域上的分布差异,因此无法可靠地检测偏见。我们首先引入两个指标,作为立法所提指标的替代方案。然后,在一系列理论实例中比较这些指标,结果显示立法所提指标似乎低估了偏见。最后,我们基于真实数据的研究表明,在现实世界的招聘应用中,立法指标同样可能失效。