Information-Theoretic Testing and Debugging of Fairness Defects in Deep Neural Networks

The deep feedforward neural networks (DNNs) are increasingly deployed in socioeconomic critical decision support software systems. DNNs are exceptionally good at finding minimal, sufficient statistical patterns within their training data. Consequently, DNNs may learn to encode decisions -- amplifying existing biases or introducing new ones -- that may disadvantage protected individuals/groups and may stand to violate legal protections. While the existing search based software testing approaches have been effective in discovering fairness defects, they do not supplement these defects with debugging aids -- such as severity and causal explanations -- crucial to help developers triage and decide on the next course of action. Can we measure the severity of fairness defects in DNNs? Are these defects symptomatic of improper training or they merely reflect biases present in the training data? To answer such questions, we present DICE: an information-theoretic testing and debugging framework to discover and localize fairness defects in DNNs. The key goal of DICE is to assist software developers in triaging fairness defects by ordering them by their severity. Towards this goal, we quantify fairness in terms of protected information (in bits) used in decision making. A quantitative view of fairness defects not only helps in ordering these defects, our empirical evaluation shows that it improves the search efficiency due to resulting smoothness of the search space. Guided by the quantitative fairness, we present a causal debugging framework to localize inadequately trained layers and neurons responsible for fairness defects. Our experiments over ten DNNs, developed for socially critical tasks, show that DICE efficiently characterizes the amounts of discrimination, effectively generates discriminatory instances, and localizes layers/neurons with significant biases.

翻译：深度前馈神经网络（DNN）正日益部署于社会经济关键决策支持软件系统中。DNN极善于在其训练数据中发现最小且充分的统计模式，因此可能习得编码决策——放大既有偏见或引入新偏见——从而对受保护个体/群体造成不利影响，甚至可能违反法律保护条款。尽管现有基于搜索的软件测试方法在发现公平性缺陷方面卓有成效，但这些方法未能为开发者提供调试辅助——例如严重程度评估和因果解释——而这对于缺陷优先级排序及后续行动决策至关重要。能否衡量DNN中公平性缺陷的严重程度？这些缺陷是训练不当的症状，还是仅反映训练数据中存在的偏见？为回答此类问题，我们提出DICE：一个基于信息论的测试与调试框架，用于发现并定位DNN中的公平性缺陷。DICE的核心目标是辅助软件开发人员通过按严重程度排序的方式对公平性缺陷进行优先级排序。为此，我们以决策过程中使用的受保护信息量（以比特为单位）来量化公平性。对公平性缺陷的量化视角不仅有助于缺陷排序，我们的实证评估还表明，由于搜索空间平滑性提升，该量化方法可提高搜索效率。在量化公平性引导下，我们提出一个因果调试框架，用于定位导致公平性缺陷的训练不足的层和神经元。我们在十个为社交关键任务开发的DNN上进行的实验表明，DICE能高效表征歧视程度、有效生成歧视性实例，并定位存在显著偏见的层/神经元。