Prototypical part network (ProtoPNet) methods have been designed to achieve interpretable classification by associating predictions with a set of training prototypes, which we refer to as trivial prototypes because they are trained to lie far from the classification boundary in the feature space. Note that it is possible to make an analogy between ProtoPNet and support vector machine (SVM) given that the classification from both methods relies on computing similarity with a set of training points (i.e., trivial prototypes in ProtoPNet, and support vectors in SVM). However, while trivial prototypes are located far from the classification boundary, support vectors are located close to this boundary, and we argue that this discrepancy with the well-established SVM theory can result in ProtoPNet models with inferior classification accuracy. In this paper, we aim to improve the classification of ProtoPNet with a new method to learn support prototypes that lie near the classification boundary in the feature space, as suggested by the SVM theory. In addition, we target the improvement of classification results with a new model, named ST-ProtoPNet, which exploits our support prototypes and the trivial prototypes to provide more effective classification. Experimental results on CUB-200-2011, Stanford Cars, and Stanford Dogs datasets demonstrate that ST-ProtoPNet achieves state-of-the-art classification accuracy and interpretability results. We also show that the proposed support prototypes tend to be better localised in the object of interest rather than in the background region.
翻译:原型部分网络(ProtoPNet)方法通过将预测与一组训练原型关联来实现可解释分类,我们称这些原型为平凡原型,因为它们在特征空间中远离分类边界。注意到,可以将ProtoPNet与支持向量机(SVM)进行类比,因为两种方法的分类都依赖于计算与一组训练点的相似性(即ProtoPNet中的平凡原型和SVM中的支持向量)。然而,平凡原型位于远离分类边界的位置,而支持向量则靠近分类边界,我们认为这种与成熟的SVM理论的不一致可能导致ProtoPNet模型的分类精度低下。本文旨在通过一种新方法学习位于特征空间中分类边界附近的支持原型(如SVM理论所建议),从而提升ProtoPNet的分类性能。此外,我们提出了名为ST-ProtoPNet的新模型,通过结合支持原型与平凡原型实现更有效的分类。在CUB-200-2011、Stanford Cars和Stanford Dogs数据集上的实验结果表明,ST-ProtoPNet在分类精度和可解释性上均达到了最先进水平。我们还发现,所提出的支持原型倾向于更准确地定位在目标物体上,而非背景区域。