Taxonomies play a crucial role in various applications by providing a structural representation of knowledge. The task of taxonomy expansion involves integrating emerging concepts into existing taxonomies by identifying appropriate parent concepts for these new query concepts. Previous approaches typically relied on self-supervised methods that generate annotation data from existing taxonomies. However, these methods are less effective when the existing taxonomy is small (fewer than 100 entities). In this work, we introduce \textsc{CodeTaxo}, a novel approach that leverages large language models through code language prompts to capture the taxonomic structure. Extensive experiments on five real-world benchmarks from different domains demonstrate that \textsc{CodeTaxo} consistently achieves superior performance across all evaluation metrics, significantly outperforming previous state-of-the-art methods. The code and data are available at \url{https://github.com/QingkaiZeng/CodeTaxo-Pub}.
翻译:分类体系通过提供知识的结构化表示,在各种应用中发挥着关键作用。分类体系扩展任务涉及将新兴概念整合到现有分类体系中,即为这些新的查询概念确定适当的父概念。先前的方法通常依赖于自监督方法,从现有分类体系中生成标注数据。然而,当现有分类体系规模较小(少于100个实体)时,这些方法效果不佳。在本工作中,我们提出了\textsc{CodeTaxo},这是一种新颖的方法,它通过代码语言提示来利用大语言模型以捕捉分类结构。在来自不同领域的五个真实世界基准数据集上进行的大量实验表明,\textsc{CodeTaxo}在所有评估指标上始终取得卓越性能,显著优于先前的最先进方法。代码和数据可在 \url{https://github.com/QingkaiZeng/CodeTaxo-Pub} 获取。