Hallucinations of large language models (LLMs) commonly occur in domain-specific downstream tasks, with no exception in ontology matching (OM). The prevalence of using LLMs for OM raises the need for benchmarks to better understand LLM hallucinations. The OAEI-LLM dataset is an extended version of the Ontology Alignment Evaluation Initiative (OAEI) datasets that evaluate LLM-specific hallucinations in OM tasks. We outline the methodology used in dataset construction and schema extension, and provide examples of potential use cases.
翻译:大型语言模型(LLMs)的幻觉现象在特定领域下游任务中普遍存在,知识图谱对齐(OM)领域亦不例外。随着LLMs在OM任务中的广泛应用,亟需建立基准数据集以深入理解其幻觉行为。OAEI-LLM数据集是在本体对齐评估倡议(OAEI)数据集基础上扩展的版本,专门用于评估LLMs在OM任务中特有的幻觉现象。本文阐述了数据集构建与模式扩展的方法论,并提供了潜在应用场景的实例说明。