Discover the Unknown Biased Attribute of an Image Classifier

Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, the previous bias identification pipeline overly relies on human experts to conjecture potential biases (e.g., gender), which may neglect other underlying biases not realized by humans. To help human experts better find the AI algorithms' biases, we study a new problem in this work -- for a classifier that predicts a target attribute of the input image, discover its unknown biased attribute. To solve this challenging problem, we use a hyperplane in the generative model's latent space to represent an image attribute; thus, the original problem is transformed to optimizing the hyperplane's normal vector and offset. We propose a novel total-variation loss within this framework as the objective function and a new orthogonalization penalty as a constraint. The latter prevents trivial solutions in which the discovered biased attribute is identical with the target or one of the known-biased attributes. Extensive experiments on both disentanglement datasets and real-world datasets show that our method can discover biased attributes and achieve better disentanglement w.r.t. target attributes. Furthermore, the qualitative results show that our method can discover unnoticeable biased attributes for various object and scene classifiers, proving our method's generalizability for detecting biased attributes in diverse domains of images. The code is available at https://git.io/J3kMh.

翻译：最近的工作发现,AI 算法从数据中学习偏差。因此, 确定AI 算法中的偏差是紧迫和至关重要的。但是, 先前的偏差识别管道过度依赖人类专家来推测潜在的偏差( 例如性别), 这可能会忽略人类没有意识到的其他基本偏差。为了帮助人类专家更好地找到AI 算法的偏差, 我们研究这项工作中的新问题 -- 给一个分类者研究一个新的问题, 用来预测输入图像的目标属性, 发现其未知的偏差属性。为了解决这个具有挑战性的问题, 我们使用变异模型潜在空间中的超高平面来代表图像属性; 因此, 原始的问题被过度依赖人类专家来推测潜在的偏差( 例如性别), 从而将潜在偏差的偏差( 例如性别) 的偏差作为目标函数, 新的或分解的处罚作为制约。后者防止一些微不足道的解决方案, 即发现偏差属性与目标或已知的偏差属性相同。在变异的图像和真实世界数据设置中, 我们的方法可以发现偏差的偏差属性, 并且能够找到我们的目标属性, 质量的变变变变化的特性方法。