Inspecting class hierarchies in classification-based metric learning models

from arxiv, The main manuscript is 22 pages. The whole paper is 49 pages. The codes for our experiments will be available in https://github.com/hjk92g/Inspecting_Hierarchies_ML . The plankton datasets are available from the Norwegian Marine Data Center (MicroS: https://doi.org/10.21335/NMDC-2102309336 , MicroL: https://doi.org/10.21335/NMDC-573815973 , MesoZ: https://doi.org/10.21335/NMDC-1805578916 )

Most classification models treat all misclassifications equally. However, different classes may be related, and these hierarchical relationships must be considered in some classification problems. These problems can be addressed by using hierarchical information during training. Unfortunately, this information is not available for all datasets. Many classification-based metric learning methods use class representatives in embedding space to represent different classes. The relationships among the learned class representatives can then be used to estimate class hierarchical structures. If we have a predefined class hierarchy, the learned class representatives can be assessed to determine whether the metric learning model learned semantic distances that match our prior knowledge. In this work, we train a softmax classifier and three metric learning models with several training options on benchmark and real-world datasets. In addition to the standard classification accuracy, we evaluate the hierarchical inference performance by inspecting learned class representatives and the hierarchy-informed performance, i.e., the classification performance, and the metric learning performance by considering predefined hierarchical structures. Furthermore, we investigate how the considered measures are affected by various models and training options. When our proposed ProxyDR model is trained without using predefined hierarchical structures, the hierarchical inference performance is significantly better than that of the popular NormFace model. Additionally, our model enhances some hierarchy-informed performance measures under the same training options. We also found that convolutional neural networks (CNNs) with random weights correspond to the predefined hierarchies better than random chance.

翻译：大多数分类模型将所有的错误分类同等对待。然而，不同类别之间可能存在关联，在某些分类问题中必须考虑这些层次关系。通过在训练过程中利用层次信息可以解决这些问题。遗憾的是，并非所有数据集都提供此类信息。许多基于分类的度量学习方法使用嵌入空间中的类代表来表征不同类别。这些学到的类代表之间的关系可用于估计类层次结构。如果预定义了类层次结构，可以通过评估学到的类代表来确定度量学习模型是否学习到了与先验知识相匹配的语义距离。本研究在基准数据集和真实数据集上，使用多种训练选项训练了一个softmax分类器和三个度量学习模型。除了标准的分类准确率外，我们还通过检查学到的类代表评估层次推理性能，并考虑预定义层次结构下的分类性能与度量学习性能，即层次感知性能。此外，我们研究了各种模型和训练选项对这些度量指标的影响。当提出的ProxyDR模型在不使用预定义层次结构的情况下进行训练时，其层次推理性能显著优于流行的NormFace模型。同时，在相同的训练选项下，我们的模型提升了一些层次感知性能指标。我们还发现，具有随机权重的卷积神经网络（CNN）与预定义层次结构的对应关系优于随机水平。