In this work, we study the effects of feature-based explanations on distributive fairness of AI-assisted decisions, specifically focusing on the task of predicting occupations from short textual bios. We also investigate how any effects are mediated by humans' fairness perceptions and their reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, we see that such explanations do not enable humans to discern correct and incorrect AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter AI recommendations that align with gender stereotypes. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results imply that feature-based explanations are not a reliable mechanism to improve distributive fairness.
翻译:本研究探讨了基于特征的解释对AI辅助决策中分配公平性的影响,特别聚焦于从简短文本简历预测职业这一任务。我们还研究了这些影响如何通过人类公平感知及其对AI建议的依赖程度所中介。研究结果表明,解释会影响公平感知,而公平感知又与人类遵循AI建议的倾向相关。然而,我们发现此类解释并不能帮助人类辨别AI建议的正确与否;相反,它们可能在不考虑AI建议正确性的情况下影响依赖程度。根据解释所强调的特征不同,这可能促进或阻碍分配公平性:当解释强调与任务无关且明显与敏感属性相关的特征时,会促使人类推翻符合性别刻板印象的AI建议;而当解释看似与任务相关时,则会诱导出强化刻板印象错误对齐的依赖行为。这些结果表明,基于特征的解释并非改善分配公平性的可靠机制。