Extracting relevant and structured knowledge from large, complex technical documents within the Reliability and Maintainability (RAM) domain is labor-intensive and prone to errors. Our work addresses this challenge by presenting OntoKGen, a genuine pipeline for ontology extraction and Knowledge Graph (KG) generation. OntoKGen leverages Large Language Models (LLMs) through an interactive user interface guided by our adaptive iterative Chain of Thought (CoT) algorithm to ensure that the ontology extraction process and, thus, KG generation align with user-specific requirements. Although KG generation follows a clear, structured path based on the confirmed ontology, there is no universally correct ontology as it is inherently based on the user's preferences. OntoKGen recommends an ontology grounded in best practices, minimizing user effort and providing valuable insights that may have been overlooked, all while giving the user complete control over the final ontology. Having generated the KG based on the confirmed ontology, OntoKGen enables seamless integration into schemeless, non-relational databases like Neo4j. This integration allows for flexible storage and retrieval of knowledge from diverse, unstructured sources, facilitating advanced querying, analysis, and decision-making. Moreover, the generated KG serves as a robust foundation for future integration into Retrieval Augmented Generation (RAG) systems, offering enhanced capabilities for developing domain-specific intelligent applications.
翻译:从可靠性、可用性与可维护性(RAM)领域的大规模复杂技术文档中提取相关且结构化的知识,通常需要耗费大量人力且容易出错。本研究通过提出OntoKGen——一个真正的本体提取与知识图谱(KG)生成流水线——来应对这一挑战。OntoKGen通过交互式用户界面,利用我们自适应的迭代思维链(CoT)算法引导大语言模型(LLMs),确保本体提取过程及后续的知识图谱生成符合用户特定需求。尽管知识图谱生成遵循基于已确认本体的清晰结构化路径,但由于本体本质上基于用户偏好,并不存在普遍正确的本体。OntoKGen推荐基于最佳实践的本体,在给予用户对最终本体完全控制权的同时,最大限度地减少用户工作量,并提供可能被忽视的宝贵见解。基于已确认的本体生成知识图谱后,OntoKGen能够实现与Neo4j等无模式非关系型数据库的无缝集成。这种集成支持对来自多样化非结构化源的知识进行灵活存储与检索,从而促进高级查询、分析与决策。此外,所生成的知识图谱为未来集成到检索增强生成(RAG)系统提供了坚实基础,为开发领域特定的智能应用提供了增强能力。