To understand how deep neural networks perform classification predictions, recent research attention has been focusing on developing techniques to offer desirable explanations. However, most existing methods cannot be easily applied for semantic segmentation; moreover, they are not designed to offer interpretability under the multi-annotator setting. Instead of viewing ground-truth pixel-level labels annotated by a single annotator with consistent labeling tendency, we aim at providing interpretable semantic segmentation and answer two critical yet practical questions: "who" contributes to the resulting segmentation, and "why" such an assignment is determined. In this paper, we present a learning framework of Tendency-and-Assignment Explainer (TAX), designed to offer interpretability at the annotator and assignment levels. More specifically, we learn convolution kernel subsets for modeling labeling tendencies of each type of annotation, while a prototype bank is jointly observed to offer visual guidance for learning the above kernels. For evaluation, we consider both synthetic and real-world datasets with multi-annotators. We show that our TAX can be applied to state-of-the-art network architectures with comparable performances, while segmentation interpretability at both levels can be offered accordingly.
翻译:为理解深度神经网络如何进行分类预测,近期研究聚焦于开发技术提供可解释性。然而,现有多数方法难以直接应用于语义分割任务,且未针对多标注者场景设计可解释性方案。不同于由单标注者按一致标注倾向提供像素级真实标签的传统范式,我们旨在提供可解释的语义分割,并回答两个关键且实际的问题:分割结果"由谁"贡献,以及"为何"确定该赋值。本文提出倾向与赋值解释器(TAX)的学习框架,旨在标注者层面与赋值层面提供可解释性。具体而言,我们学习卷积核子集以建模每类标注的标注倾向,同时联合观测原型库以提供视觉引导辅助上述卷积核学习。在评估环节,我们采用含多标注者的合成数据集与真实数据集,证明TAX以可比较性能适配最新网络架构,并能在上述两个层面提供分割可解释性。