Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups defined by combinations of demographic or other sensitive attributes. The standard approach is to stratify the evaluation data across subgroups and compute performance metrics separately for each group. However, even for moderately-sized evaluation datasets, sample sizes quickly get small once considering intersectional subgroups, which greatly limits the extent to which intersectional groups are included in analysis. In this work, we introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups. We provide corresponding inference strategies for constructing confidence intervals and explore how goodness-of-fit testing can yield insight into the structure of fairness-related harms experienced by intersectional groups. We evaluate our approach on two publicly available datasets, and several variants of semi-synthetic data. The results show that our method is considerably more accurate than the standard approach, especially for small subgroups, and demonstrate how goodness-of-fit testing helps identify the key factors that drive differences in performance.
翻译:分解评估是AI公平性评估中的核心任务,其目标是通过人口统计或其他敏感属性的组合定义的不同子群体,衡量AI系统的性能。标准方法是对评估数据按子群体进行分层,并分别计算每个群体的性能指标。然而,即便是中等规模的评估数据集,在考虑交叉子群体时,样本量也会迅速变小,这极大限制了交叉群体在分析中的纳入范围。本研究提出一种用于分解评估的结构化回归方法,我们证明即使对于极小子群体,该方法也能生成可靠的系统性能估计。我们提供相应的推断策略以构建置信区间,并探讨拟合优度检验如何揭示交叉群体所经历公平性相关损害的潜在结构。我们在两个公开数据集及多个半合成数据变体上评估该方法。结果表明,我们的方法比标准方法具有显著更高的准确性,尤其是在小样本子群体中,同时验证了拟合优度检验如何帮助识别导致性能差异的关键因素。