This short paper presents preliminary research on the Case-Enhanced Vision Transformer (CEViT), a similarity measurement method aimed at improving the explainability of similarity assessments for image data. Initial experimental results suggest that integrating CEViT into k-Nearest Neighbor (k-NN) classification yields classification accuracy comparable to state-of-the-art computer vision models, while adding capabilities for illustrating differences between classes. CEViT explanations can be influenced by prior cases, to illustrate aspects of similarity relevant to those cases.
翻译:本短文介绍了案例增强视觉Transformer(CEViT)的初步研究,这是一种旨在提升图像数据相似性评估可解释性的相似性度量方法。初步实验结果表明,将CEViT集成到k-最近邻(k-NN)分类中,在获得与前沿计算机视觉模型相当的分类精度的同时,增强了展示类别间差异的能力。CEViT的解释机制可受先验案例影响,从而凸显与这些案例相关的相似性特征维度。