The paper introduces EICopilot, an novel agent-based solution enhancing search and exploration of enterprise registration data within extensive online knowledge graphs like those detailing legal entities, registered capital, and major shareholders. Traditional methods necessitate text-based queries and manual subgraph explorations, often resulting in time-consuming processes. EICopilot, deployed as a chatbot via Baidu Enterprise Search, improves this landscape by utilizing Large Language Models (LLMs) to interpret natural language queries. This solution automatically generates and executes Gremlin scripts, providing efficient summaries of complex enterprise relationships. Distinct feature a data pre-processing pipeline that compiles and annotates representative queries into a vector database of examples for In-context learning (ICL), a comprehensive reasoning pipeline combining Chain-of-Thought with ICL to enhance Gremlin script generation for knowledge graph search and exploration, and a novel query masking strategy that improves intent recognition for heightened script accuracy. Empirical evaluations demonstrate the superior performance of EICopilot, including speed and accuracy, over baseline methods, with the \emph{Full Mask} variant achieving a syntax error rate reduction to as low as 10.00% and an execution correctness of up to 82.14%. These components collectively contribute to superior querying capabilities and summarization of intricate datasets, positioning EICopilot as a groundbreaking tool in the exploration and exploitation of large-scale knowledge graphs for enterprise information search.
翻译:本文介绍了EICopilot,一种基于智能体的创新解决方案,旨在增强对大规模在线知识图谱中企业注册数据(如法人实体、注册资本及主要股东等详细信息)的搜索与探索能力。传统方法依赖基于文本的查询和手动子图探索,通常导致耗时流程。EICopilot通过百度企业搜索以聊天机器人形式部署,利用大语言模型解析自然语言查询,从而改善这一现状。该方案能自动生成并执行Gremlin脚本,为复杂企业关系提供高效摘要。其显著特性包括:通过数据预处理流程将代表性查询编译标注为向量化示例数据库以支持上下文学习;结合思维链与上下文学习的综合推理流程,以增强知识图谱搜索与探索的Gremlin脚本生成能力;以及创新的查询掩码策略,通过提升意图识别精度来提高脚本准确率。实证评估表明,EICopilot在速度和准确性等方面均优于基线方法,其中\emph{Full Mask}变体将语法错误率降低至10.00%,执行正确率最高达82.14%。这些组件共同提升了复杂数据集的查询与摘要能力,使EICopilot成为大规模知识图谱企业信息搜索领域具有突破性意义的探索与利用工具。