In the last few years, due to the broad applicability of deep learning to downstream tasks and end-to-end training capabilities, increasingly more concerns about potential biases to specific, non-representative patterns have been raised. Many works focusing on unsupervised debiasing usually leverage the tendency of deep models to learn ``easier'' samples, for example by clustering the latent space to obtain bias pseudo-labels. However, the interpretation of such pseudo-labels is not trivial, especially for a non-expert end user, as it does not provide semantic information about the bias features. To address this issue, we introduce ``Say My Name'' (SaMyNa), the first tool to identify biases within deep models semantically. Unlike existing methods, our approach focuses on biases learned by the model. Our text-based pipeline enhances explainability and supports debiasing efforts: applicable during either training or post-hoc validation, our method can disentangle task-related information and proposes itself as a tool to analyze biases. Evaluation on traditional benchmarks demonstrates its effectiveness in detecting biases and even disclaiming them, showcasing its broad applicability for model diagnosis.
翻译:近年来,由于深度学习在下游任务中的广泛适用性及其端到端的训练能力,针对模型可能学习到的特定、非代表性模式的潜在偏见日益引发关注。许多专注于无监督去偏见的研究通常利用深度模型倾向于学习“更简单”样本的特性,例如通过对潜在空间进行聚类以获得偏见伪标签。然而,这类伪标签的解释并非易事,尤其对于非专业的终端用户而言,因其未能提供关于偏见特征的语义信息。为解决这一问题,我们提出了“说出我的名字”(SaMyNa),这是首个能够从语义层面识别深度模型内部偏见的工具。与现有方法不同,我们的方法聚焦于模型学习到的偏见。基于文本的流程增强了可解释性并支持去偏见工作:该方法适用于训练期间或事后验证阶段,能够解耦任务相关信息,并可作为分析偏见的工具。在传统基准测试上的评估证明了其在检测偏见甚至消除偏见方面的有效性,展现了其在模型诊断中的广泛适用性。