Driven by an increasing prevalence of trackers, ever more IoT sensors, and the declining cost of computing power, geospatial information has come to play a pivotal role in contemporary predictive models. While enhancing prognostic performance, geospatial data also has the potential to perpetuate many historical socio-economic patterns, raising concerns about a resurgence of biases and exclusionary practices, with their disproportionate impacts on society. Addressing this, our paper emphasizes the crucial need to identify and rectify such biases and calibration errors in predictive models, particularly as algorithms become more intricate and less interpretable. The increasing granularity of geospatial information further introduces ethical concerns, as choosing different geographical scales may exacerbate disparities akin to redlining and exclusionary zoning. To address these issues, we propose a toolkit for identifying and mitigating biases arising from geospatial data. Extending classical fairness definitions, we incorporate an ordinal regression case with spatial attributes, deviating from the binary classification focus. This extension allows us to gauge disparities stemming from data aggregation levels and advocates for a less interfering correction approach. Illustrating our methodology using a Parisian real estate dataset, we showcase practical applications and scrutinize the implications of choosing geographical aggregation levels for fairness and calibration measures.
翻译:随着追踪器日益普及、物联网传感器数量激增以及计算成本持续下降,地理空间信息在现代预测模型中已占据核心地位。地理空间数据在提升预测性能的同时,也可能固化诸多历史社会经济模式,引发人们对偏见与排他性行为重新抬头及其对社会造成不成比例影响的担忧。针对这一问题,本文强调在算法日益复杂且可解释性降低的背景下,识别并纠正预测模型中此类偏差与标定误差的迫切性。地理空间信息粒度的不断提高进一步引发伦理关切——不同地理尺度的选择可能加剧类似红线划定与排斥性分区的不平等现象。为应对这些挑战,我们提出了一套用于识别并缓解地理空间数据所引发偏差的工具包。通过扩展经典公平性定义,我们引入具有空间属性的有序回归案例,突破了传统二元分类的局限。该扩展使我们能够量化不同数据聚合层级导致的不均衡性,并倡导一种干扰更小的修正方案。我们以巴黎房地产数据集为例演示该方法,在展示实际应用的同时,深入剖析地理聚合层级选择对公平性与标定指标的影响。