Towards Fair and Explainable AI using a Human-Centered AI Approach

The rise of machine learning (ML) is accompanied by several high-profile cases that have stressed the need for fairness, accountability, explainability and trust in ML systems. The existing literature has largely focused on fully automated ML approaches that try to optimize for some performance metric. However, human-centric measures like fairness, trust, explainability, etc. are subjective in nature, context-dependent, and might not correlate with conventional performance metrics. To deal with these challenges, we explore a human-centered AI approach that empowers people by providing more transparency and human control. In this dissertation, we present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings. The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers (crowd workers). Our study found that adding explanations supports trust calibration for the resulting ML model and enables rich forms of teaching feedback. The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in tabular datasets. Apart from fairness, we found that our tool also enhances trust and accountability. The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups, such as females, or subgroups, such as Black Muslim females. The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social biases in creative writing. Finally, the last project presents an empirical study aimed at understanding the cumulative impact of multiple fairness-enhancing interventions at different stages of the ML pipeline on fairness, utility and different population groups. We conclude by discussing some of the future directions.

翻译：机器学习（ML）的兴起伴随着多个备受关注的案例，这些案例凸显了ML系统对公平性、问责性、可解释性和可信赖性的需求。现有文献主要聚焦于完全自动化的机器学习方法，试图优化某些性能指标。然而，公平性、可信赖性、可解释性等人本维度度量本质上是主观的、依赖上下文的，且可能与传统性能指标不相关。为应对这些挑战，我们探索了一种以人为本的人工智能方法，通过提供更高的透明度和人为控制来赋能用户。本文通过5个研究项目，旨在提升分类系统和词嵌入的可解释性与公平性。第一个项目探究了将局部模型解释作为机器学习教师（众包工作者）界面的效用与局限。研究发现，添加解释有助于校准对最终ML模型的可信赖度，并能支持丰富的教学反馈形式。第二个项目提出了D-BIAS，一种基于因果的人机协同可视化工具，用于识别和缓解表格数据集中的社会偏见。除公平性外，我们发现该工具还能增强可信赖性与问责性。第三个项目提出了WordBias，一种交互式可视化工具，用于审计预训练静态词嵌入中针对群体（如女性）或子群体（如黑人穆斯林女性）的偏见。第四个项目提出了DramatVis Personae，一种视觉分析工具，用于识别创意写作中的社会偏见。最后一个项目通过实证研究，探讨了ML流程不同阶段中多种公平增强干预措施对公平性、效用及不同人群的累积影响。我们最后讨论了未来研究方向。