Data-driven software solutions have significantly been used in critical domains with significant socio-economic, legal, and ethical implications. The rapid adoptions of data-driven solutions, however, pose major threats to the trustworthiness of automated decision-support software. A diminished understanding of the solution by the developer and historical/current biases in the data sets are primary challenges. To aid data-driven software developers and end-users, we present \toolname, a debugging tool to test and explain the fairness implications of data-driven solutions. \toolname visualizes the logic of datasets, trained models, and decisions for a given data point. In addition, it trains various models with varying fairness-accuracy trade-offs. Crucially, \toolname incorporates counterfactual fairness testing that finds bugs beyond the development datasets. We conducted two studies through \toolname that allowed us to measure false positives/negatives in prevalent counterfactual testing and understand the human perception of counterfactual test cases in a class survey. \toolname and its benchmarks are publicly available at~\url{https://github.com/Pennswood/FairLay-ML}. The live version of the tool is available at~\url{https://fairlayml-v2.streamlit.app/}. We provide a video demo of the tool at https://youtu.be/wNI9UWkywVU?t=127
翻译:数据驱动的软件解决方案已广泛应用于具有重大社会经济、法律和伦理影响的关键领域。然而,数据驱动解决方案的快速采用对自动化决策支持软件的可信度构成了重大威胁。开发者对解决方案的理解不足以及数据集中存在的历史/当前偏见是主要挑战。为协助数据驱动软件开发者和终端用户,我们提出了\toolname,一种用于测试和解释数据驱动解决方案公平性影响的调试工具。\toolname 可对给定数据点的数据集逻辑、训练模型及决策过程进行可视化。此外,它还能训练多种在公平性与准确性之间具有不同权衡的模型。关键在于,\toolname 集成了反事实公平性测试,能够发现超出开发数据集范围的缺陷。我们通过\toolname 进行了两项研究:一项用于衡量现有反事实测试中的假阳性/假阴性率,另一项通过课堂调查了解人类对反事实测试用例的感知。\toolname 及其基准测试代码已公开于~\url{https://github.com/Pennswood/FairLay-ML}。该工具的在线版本可通过~\url{https://fairlayml-v2.streamlit.app/} 访问。工具演示视频详见 https://youtu.be/wNI9UWkywVU?t=127。