Simplified Concrete Dropout -- Improving the Generation of Attribution Masks for Fine-grained Classification

Fine-grained classification is a particular case of a classification problem, aiming to classify objects that share the visual appearance and can only be distinguished by subtle differences. Fine-grained classification models are often deployed to determine animal species or individuals in automated animal monitoring systems. Precise visual explanations of the model's decision are crucial to analyze systematic errors. Attention- or gradient-based methods are commonly used to identify regions in the image that contribute the most to the classification decision. These methods deliver either too coarse or too noisy explanations, unsuitable for identifying subtle visual differences reliably. However, perturbation-based methods can precisely identify pixels causally responsible for the classification result. Fill-in of the dropout (FIDO) algorithm is one of those methods. It utilizes the concrete dropout (CD) to sample a set of attribution masks and updates the sampling parameters based on the output of the classification model. A known problem of the algorithm is a high variance in the gradient estimates, which the authors have mitigated until now by mini-batch updates of the sampling parameters. This paper presents a solution to circumvent these computational instabilities by simplifying the CD sampling and reducing reliance on large mini-batch sizes. First, it allows estimating the parameters with smaller mini-batch sizes without losing the quality of the estimates but with a reduced computational effort. Furthermore, our solution produces finer and more coherent attribution masks. Finally, we use the resulting attribution masks to improve the classification performance of a trained model without additional fine-tuning of the model.

翻译：细粒度分类是分类问题的一种特例，旨在识别视觉外观相似、仅能通过细微差异区分的对象。细粒度分类模型常用于自动化动物监测系统中判定动物物种或个体身份。对模型决策进行精确的可视化解释对于分析系统性错误至关重要。注意力机制或基于梯度的方法通常用于识别图像中对分类决策贡献最大的区域，但这些方法生成的解释要么过于粗糙，要么噪声过多，难以可靠地识别细微差异。相比之下，基于扰动的方法能够精确识别对分类结果具有因果关系的像素。填充式丢弃（FIDO）算法便是其中之一。它利用具体化丢弃法（CD）对一组归因掩码进行采样，并根据分类模型的输出更新采样参数。该算法已知的一个问题是梯度估计方差较大，此前研究者通过小批量参数更新来缓解这一问题。本文提出一种解决方案，通过简化CD采样并降低对大批量样本数的依赖来规避这些计算不稳定性。首先，该方法允许在保持估计质量的同时使用更小的批量大小进行参数估计，从而降低计算开销。此外，我们的解决方案能生成更精细且更连贯的归因掩码。最后，我们利用生成的归因掩码来提升已训练模型的分类性能，而无需对模型进行额外微调。