In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an external knowledge source. This has especially been the case for classification tasks, where recent work has focused on creating pipeline models that retrieve information from KGs like ConceptNet as additional context. Many of these models consist of multiple components, and although they differ in the number and nature of these parts, they all have in common that for some given text query, they attempt to identify and retrieve a relevant subgraph from the KG. Due to the noise and idiosyncrasies often found in KGs, it is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query. In this work, we try to bridge this knowledge gap by reviewing current approaches to text-to-KG alignment and evaluating them on two datasets where manually created graphs are available, providing insights into the effectiveness of current methods.
翻译:与大规模文本语料库相比,知识图谱提供了事实信息的密集且结构化表示。这使得它们在通过外部知识源补充或锚定预训练语言模型中的知识方面具有吸引力,尤其是在分类任务中。近期研究致力于构建流水线模型,从如ConceptNet等知识图谱中检索额外上下文信息。这些模型大多由多个组件构成,尽管其组件数量与性质各异,但共同点在于:针对给定的文本查询,它们试图从知识图谱中识别并检索相关子图。由于知识图谱中常见的噪声与特性差异,当前方法相较于完全对齐查询相关子图的理想场景表现如何尚不明确。本文通过回顾当前文本到知识图谱对齐的方法,并在两个具有人工构建图的数据集上对其进行评估,以弥合这一知识空白,从而揭示当前方法的有效性。