Deep learning models are effective, yet brittle. Even carefully trained, their behavior tends to be hard to predict when confronted with out-of-distribution samples. In this work, our goal is to propose a simple yet effective solution to predict and describe via natural language potential failure modes of computer vision models. Given a pretrained model and a set of samples, our aim is to find sentences that accurately describe the visual conditions in which the model underperforms. In order to study this important topic and foster future research on it, we formalize the problem of Language-Based Error Explainability (LBEE) and propose a set of metrics to evaluate and compare different methods for this task. We propose solutions that operate in a joint vision-and-language embedding space, and can characterize through language descriptions model failures caused, e.g., by objects unseen during training or adverse visual conditions. We experiment with different tasks, such as classification under the presence of dataset bias and semantic segmentation in unseen environments, and show that the proposed methodology isolates nontrivial sentences associated with specific error causes. We hope our work will help practitioners better understand the behavior of models, increasing their overall safety and interpretability.
翻译:深度学习模型虽然高效,却十分脆弱。即便经过精心训练,当面对分布外样本时,其行为仍难以预测。本工作的目标是提出一种简单而有效的解决方案,通过自然语言预测并描述计算机视觉模型潜在的故障模式。给定一个预训练模型和一组样本,我们的目标是找到能准确描述模型性能不佳时视觉条件的语句。为研究这一重要课题并推动未来相关研究,我们形式化了基于语言的错误可解释性(LBEE)问题,并提出一套评估和比较该任务不同方法的指标。我们提出的解决方案在视觉-语言联合嵌入空间中运行,能够通过语言描述来刻画模型故障,例如由训练期间未见过的物体或不利视觉条件引起的故障。我们在不同任务上进行了实验,包括存在数据集偏差情况下的分类以及未知环境中的语义分割,结果表明所提方法能够分离出与特定错误原因相关的非平凡语句。我们希望这项工作能帮助从业者更好地理解模型行为,从而提升其整体安全性和可解释性。