We design a user-friendly and scalable knowledge graph construction (KGC) system for extracting structured knowledge from the unstructured corpus. Different from existing KGC systems, gBuilder provides a flexible and user-defined pipeline to embrace the rapid development of IE models. More built-in template-based or heuristic operators and programmable operators are available for adapting to data from different domains. Furthermore, we also design a cloud-based self-adaptive task scheduling for gBuilder to ensure its scalability on large-scale knowledge graph construction. Experimental evaluation demonstrates the ability of gBuilder to organize multiple information extraction models for knowledge graph construction in a uniform platform, and confirms its high scalability on large-scale KGC tasks.
翻译:我们设计了一个用户友好且可扩展的知识图谱构建系统,用于从非结构化语料中抽取结构化知识。与现有KGC系统不同,gBuilder提供了灵活且用户可定义的流水线,以适配信息抽取模型的快速发展。该系统内置了更多基于模板或启发式的算子以及可编程算子,可适应不同领域的数据。此外,我们还为gBuilder设计了基于云的自适应任务调度机制,以确保其在大规模知识图谱构建中的可扩展性。实验评估表明,gBuilder能够在统一平台上组织多种信息抽取模型完成知识图谱构建,并验证了其在处理大规模KGC任务时的高可扩展性。