In order to construct or extend entity-centric and event-centric knowledge graphs (KG and EKG), the information extraction (IE) annotation toolkit is essential. However, existing IE toolkits have several non-trivial problems, such as not supporting multi-tasks, not supporting automatic updates. In this work, we present CollabKG, a learnable human-machine-cooperative IE toolkit for KG and EKG construction. Specifically, for the multi-task issue, CollabKG unifies different IE subtasks, including named entity recognition (NER), entity-relation triple extraction (RE), and event extraction (EE), and supports both KG and EKG. Then, combining advanced prompting-based IE technology, the human-machine-cooperation mechanism with LLMs as the assistant machine is presented which can provide a lower cost as well as a higher performance. Lastly, owing to the two-way interaction between the human and machine, CollabKG with learning ability allows self-renewal. Besides, CollabKG has several appealing features (e.g., customization, training-free, propagation, etc.) that make the system powerful, easy-to-use, and high-productivity. We holistically compare our toolkit with other existing tools on these features. Human evaluation quantitatively illustrates that CollabKG significantly improves annotation quality, efficiency, and stability simultaneously.
翻译:为了构建或扩展以实体和事件为中心的知识图谱(KG与EKG),信息抽取(IE)标注工具包不可或缺。然而,现有的IE工具包存在若干不可忽视的问题,例如不支持多任务、不支持自动更新等。本文提出CollabKG——一种面向KG与EKG构建的可学习人机协同IE工具包。具体而言,针对多任务问题,CollabKG统一了不同的IE子任务,包括命名实体识别(NER)、实体关系三元组抽取(RE)和事件抽取(EE),并同时支持KG与EKG的构建。在此基础上,结合先进的基于提示词的IE技术,本文提出以大型语言模型(LLM)作为辅助机器的人机协同机制,该机制能够以更低的成本获得更高的性能。最后,得益于人机双向交互,具备学习能力的CollabKG可实现自我更新。此外,CollabKG还具备多项优势特性(如定制化、免训练、传播性等),使得该系统功能强大、易于使用且具有高生产率。我们将本工具包与现有其他工具在这些特性上进行了全面比较。人工评估定量表明,CollabKG能同时显著提升标注质量、效率与稳定性。