Machine learning models can perform well on in-distribution data but often fail on biased subgroups that are underrepresented in the training data, hindering the robustness of models for reliable applications. Such subgroups are typically unknown due to the absence of subgroup labels. Discovering biased subgroups is the key to understanding models' failure modes and further improving models' robustness. Most previous works of subgroup discovery make an implicit assumption that models only underperform on a single biased subgroup, which does not hold on in-the-wild data where multiple biased subgroups exist. In this work, we propose Decomposition, Interpretation, and Mitigation (DIM), a novel method to address a more challenging but also more practical problem of discovering multiple biased subgroups in image classifiers. Our approach decomposes the image features into multiple components that represent multiple subgroups. This decomposition is achieved via a bilinear dimension reduction method, Partial Least Square (PLS), guided by useful supervision from the image classifier. We further interpret the semantic meaning of each subgroup component by generating natural language descriptions using vision-language foundation models. Finally, DIM mitigates multiple biased subgroups simultaneously via two strategies, including the data- and model-centric strategies. Extensive experiments on CIFAR-100 and Breeds datasets demonstrate the effectiveness of DIM in discovering and mitigating multiple biased subgroups. Furthermore, DIM uncovers the failure modes of the classifier on Hard ImageNet, showcasing its broader applicability to understanding model bias in image classifiers. The code is available at https://github.com/ZhangAIPI/DIM.
翻译:机器学习模型在分布内数据上表现良好,但常因训练数据中代表性不足的偏差子群而失败,这阻碍了模型在可靠应用中的鲁棒性。由于缺乏子群标签,这些子群通常是未知的。发现偏差子群是理解模型失败模式并进一步提升模型鲁棒性的关键。以往的大多数子群发现方法隐含假设模型仅在单个偏差子群上表现不佳,但该假设在存在多个偏差子群的野外数据中并不成立。本文提出分解、解释与缓解(DIM)方法,旨在解决图像分类器中多个偏差子群发现这一更具挑战性且更实际的问题。我们的方法通过将图像特征分解为多个分量来表示多个子群,该分解借助图像分类器的有效监督,通过双线性降维方法——偏最小二乘法实现。进一步地,我们利用视觉语言基础模型生成自然语言描述,以解释每个子群分量的语义含义。最后,DIM通过数据驱动和模型驱动两种策略同时缓解多个偏差子群。在CIFAR-100和Breeds数据集上的大量实验表明,DIM在发现和缓解多个偏差子群方面具有有效性。此外,DIM揭示了分类器在Hard ImageNet上的失败模式,展示了其在理解图像分类器模型偏差方面的广泛适用性。代码地址为:https://github.com/ZhangAIPI/DIM。