We introduce two reference-free metrics for quality evaluation of taxonomies in the absence of labels. The first metric evaluates robustness by calculating the correlation between semantic and taxonomic similarity, addressing error types not considered by existing metrics. The second uses Natural Language Inference to assess logical adequacy. Both metrics are tested on five taxonomies and are shown to correlate well with F1 against ground truth taxonomies. We further demonstrate that our metrics can predict downstream performance in hierarchical classification when used with label hierarchies.
翻译:本文提出了两种无需参考标签即可评估分类体系质量的无参考指标。第一种指标通过计算语义相似度与分类相似度的相关性来评估鲁棒性,解决了现有指标未考虑的误差类型。第二种指标利用自然语言推理评估逻辑完备性。两种指标在五个分类体系上进行了测试,结果显示其与基于真实分类体系的F1分数具有良好相关性。我们进一步证明,当与标签层次结构结合使用时,本研究所提指标能够有效预测层次分类任务的下游性能。