The potential for bias and unfairness in AI-supporting government services raises ethical and legal concerns. Using crime rate prediction with the Bristol City Council data as a case study, we examine how these issues persist. Rather than auditing real-world deployed systems, our goal is to understand why widely adopted bias mitigation techniques often fail when applied to government data. Our findings reveal that bias mitigation approaches applied to government data are not always effective -- not because of flaws in model architecture or metric selection, but due to the inherent properties of the data itself. Through comparing a set of comprehensive models and fairness methods, our experiments consistently show that the mitigation efforts cannot overcome the embedded unfairness in the data -- further reinforcing that the origin of bias lies in the structure and history of government datasets. We then explore the reasons for the mitigation failures in predictive models on government data and highlight the potential sources of unfairness posed by data distribution shifts, the accumulation of historical bias, and delays in data release. We also discover the limitations of the blind spots in fairness analysis and bias mitigation methods when only targeting a single sensitive feature through a set of intersectional fairness experiments. Although this study is limited to one city, the findings are highly suggestive, which can contribute to an early warning that biases in government data may persist even with standard mitigation methods.
翻译:人工智能辅助政府服务中潜在的偏见与不公平性引发了伦理与法律层面的担忧。本文以布里斯托市议会数据中的犯罪率预测为案例,深入探讨了这些问题持续存在的原因。与审计实际部署系统不同,本研究旨在揭示广泛采用的偏差缓解技术在应用于政府数据时频繁失效的根源。研究发现,应用于政府数据的偏差缓解方法并非总能奏效——其原因不在于模型架构缺陷或度量标准选择不当,而在于数据本身的内在特性。通过比较一系列综合模型与公平性方法,我们的实验一致表明:缓解措施无法克服数据中嵌入的不公平性——这进一步证实了偏差的根源在于政府数据集的结构与历史沿革。我们继而探究了政府数据预测模型中缓解措施失效的原因,并着重指出了数据分布偏移、历史偏差累积以及数据发布延迟所构成的不公平性潜在来源。通过一系列交叉公平性实验,我们还发现当公平性分析与偏差缓解方法仅针对单一敏感特征时存在的盲区与局限。尽管本研究仅局限于单一城市,但其发现具有重要启示意义,可为“政府数据中的偏差在标准缓解方法下仍可能持续存在”提供早期预警。