FLAME: Self-Supervised Low-Resource Taxonomy Expansion using Large Language Models

Taxonomies represent an arborescence hierarchical structure that establishes relationships among entities to convey knowledge within a specific domain. Each edge in the taxonomy signifies a hypernym-hyponym relationship. Taxonomies find utility in various real-world applications, such as e-commerce search engines and recommendation systems. Consequently, there arises a necessity to enhance these taxonomies over time. However, manually curating taxonomies with neoteric data presents challenges due to limitations in available human resources and the exponential growth of data. Therefore, it becomes imperative to develop automatic taxonomy expansion methods. Traditional supervised taxonomy expansion approaches encounter difficulties stemming from limited resources, primarily due to the small size of existing taxonomies. This scarcity of training data often leads to overfitting. In this paper, we propose FLAME, a novel approach for taxonomy expansion in low-resource environments by harnessing the capabilities of large language models that are trained on extensive real-world knowledge. LLMs help compensate for the scarcity of domain-specific knowledge. Specifically, FLAME leverages prompting in few-shot settings to extract the inherent knowledge within the LLMs, ascertaining the hypernym entities within the taxonomy. Furthermore, it employs reinforcement learning to fine-tune the large language models, resulting in more accurate predictions. Experiments on three real-world benchmark datasets demonstrate the effectiveness of FLAME in real-world scenarios, achieving a remarkable improvement of 18.5% in accuracy and 12.3% in Wu & Palmer metric over eight baselines. Furthermore, we elucidate the strengths and weaknesses of FLAME through an extensive case study, error analysis and ablation studies on the benchmarks.

翻译：分类体系呈现一种树状层次结构，通过建立实体间关系在特定领域内传递知识。该结构中的每条边代表一个上位词-下位词关系。分类体系在诸多实际应用（如电子商务搜索引擎和推荐系统）中具有重要价值，因此需要随时间推移不断对其进行扩充。然而，由于人力资源有限且数据呈指数级增长，手动整理包含新数据的分类体系面临挑战。由此，开发自动分类体系扩展方法变得至关重要。传统监督式分类体系扩展方法因现有分类体系规模较小而遭遇资源限制，这种训练数据的稀缺往往导致过拟合。本文提出FLAME——一种利用基于海量真实世界知识训练的大语言模型能力的新型低资源环境分类体系扩展方法。大语言模型有助于弥补领域特定知识的不足。具体而言，FLAME采用少样本提示学习技术提取大语言模型内在知识，确定分类体系中的上位词实体；并进一步运用强化学习对大语言模型进行微调，提升预测准确性。在三个真实基准数据集上的实验表明，FLAME在真实场景中表现优异，相比八种基线方法在准确率上实现18.5%的提升，在Wu & Palmer指标上实现12.3%的提升。此外，我们通过广泛的案例研究、误差分析和消融实验阐释了FLAME在基准测试中的优势与局限。