Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46.
翻译:大规模知识图谱(如Wikidata)试图捕捉关于多个实体的世界知识。近期研究聚焦于从文本中自动丰富这些知识图谱。然而,低资源语言中以自然文本形式呈现的大量信息往往被遗漏。跨语言信息抽取旨在从低资源印度语言文本中提取英文三元组形式的事实信息。尽管其潜力巨大,但与单语信息抽取相比,该任务的进展仍显滞后。本文提出了从文本中进行跨语言事实抽取(CLFE)的任务,并设计了一种端到端生成式方法,实现了77.46的整体F1分数。