Prototypical part network (ProtoPNet) methods have been designed to achieve interpretable classification by associating predictions with a set of training prototypes, which we refer to as trivial prototypes because they are trained to lie far from the classification boundary in the feature space. Note that it is possible to make an analogy between ProtoPNet and support vector machine (SVM) given that the classification from both methods relies on computing similarity with a set of training points (i.e., trivial prototypes in ProtoPNet, and support vectors in SVM). However, while trivial prototypes are located far from the classification boundary, support vectors are located close to this boundary, and we argue that this discrepancy with the well-established SVM theory can result in ProtoPNet models with inferior classification accuracy. In this paper, we aim to improve the classification of ProtoPNet with a new method to learn support prototypes that lie near the classification boundary in the feature space, as suggested by the SVM theory. In addition, we target the improvement of classification results with a new model, named ST-ProtoPNet, which exploits our support prototypes and the trivial prototypes to provide more effective classification. Experimental results on CUB-200-2011, Stanford Cars, and Stanford Dogs datasets demonstrate that ST-ProtoPNet achieves state-of-the-art classification accuracy and interpretability results. We also show that the proposed support prototypes tend to be better localised in the object of interest rather than in the background region.
翻译:原型部分网络(ProtoPNet)方法通过将预测与一组训练原型(我们称之为平凡原型,因其在特征空间中远离分类边界)相关联,实现了可解释分类。注意到ProtoPNet与支持向量机(SVM)存在类比关系,因为两种方法的分类均依赖于计算与一组训练点的相似度(即ProtoPNet中的平凡原型,SVM中的支持向量)。然而,平凡原型位于远离分类边界的区域,而支持向量则靠近该边界。我们认为这种与成熟SVM理论的不一致性可能导致ProtoPNet模型分类精度下降。本文旨在通过一种新方法学习位于特征空间中分类边界附近的"支持原型"(如SVM理论所建议),以提升ProtoPNet的分类性能。此外,我们提出了名为ST-ProtoPNet的新模型,通过结合支持原型与平凡原型实现更有效的分类。在CUB-200-2011、Stanford Cars和Stanford Dogs数据集上的实验结果表明,ST-ProtoPNet在分类精度与可解释性方面均达到最优水平。我们还证明,所提出的支持原型更倾向于定位在目标物体区域而非背景区域。