Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics

The widespread adoption of deep-learning models in data-driven applications has drawn attention to the potential risks associated with biased datasets and models. Neglected or hidden biases within datasets and models can lead to unexpected results. This study addresses the challenges of dataset bias and explores ``shortcut learning'' or ``Clever Hans effect'' in binary classifiers. We propose a novel framework for analyzing the black-box classifiers and for examining the impact of both training and test data on classifier scores. Our framework incorporates intervention and observational perspectives, employing a linear mixed-effects model for post-hoc analysis. By evaluating classifier performance beyond error rates, we aim to provide insights into biased datasets and offer a comprehensive understanding of their influence on classifier behavior. The effectiveness of our approach is demonstrated through experiments on audio anti-spoofing and speaker verification tasks using both statistical models and deep neural networks. The insights gained from this study have broader implications for tackling biases in other domains and advancing the field of explainable artificial intelligence.

翻译：深度学习模型在数据驱动应用中的广泛采用，已引起人们对与有偏数据集和模型相关的潜在风险的关注。数据集和模型中被忽视或隐藏的偏差可能导致意外结果。本研究针对数据集偏差的挑战，探讨了二元分类器中的“捷径学习”或“Clever Hans效应”。我们提出了一种新颖的框架，用于分析黑盒分类器，并检验训练数据和测试数据对分类器分数的影响。该框架融合了干预和观测视角，采用线性混合效应模型进行事后分析。通过评估超出错误率的分类器性能，我们旨在为有偏数据集提供洞见，并对其如何影响分类器行为提供全面理解。我们通过在音频反欺骗和说话人验证任务上使用统计模型和深度神经网络进行实验，证明了所提方法的有效性。本研究获得的见解对于解决其他领域的偏差问题以及推进可解释人工智能领域的发展具有更广泛的意义。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

【干货书】有限样本学习-元学习及其在通信系统中的应用，134页pdf，

专知会员服务

59+阅读 · 2022年10月8日

【MIT博士论文】鲁棒高效的深度学习在虚假信息预防中的应用

专知会员服务

26+阅读 · 2022年7月13日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

硬核书《可解释机器学习》最新版，打开黑盒之谜（431页pdf下载）

专知会员服务

158+阅读 · 2021年10月3日