Relation extraction is an efficient way of mining the extraordinary wealth of human knowledge on the Web. Existing methods rely on domain-specific training data or produce noisy outputs. We focus here on extracting targeted relations from semi-structured web pages given only a short description of the relation. We present GraphScholarBERT, an open-domain information extraction method based on a joint graph and language model structure. GraphScholarBERT can generalize to previously unseen domains without additional data or training and produces only clean extraction results matched to the search keyword. Experiments show that GraphScholarBERT can improve extraction F1 scores by as much as 34.8\% compared to previous work in a zero-shot domain and zero-shot website setting.
翻译:关系抽取是从网络上挖掘海量人类知识的高效途径。现有方法依赖领域特定训练数据或产生噪声输出。本文聚焦于仅依据关系的简短描述,从半结构化网页中抽取目标关系。我们提出GraphScholarBERT——一种基于图与语言模型联合结构的开放域信息抽取方法。GraphScholarBERT能够泛化至先前未见的领域,无需额外数据或训练,且仅输出与搜索关键词匹配的干净抽取结果。实验表明,在零样本领域与零样本网站场景下,GraphScholarBERT的抽取F1分数相较于先前工作最高可提升34.8%。