Automatic Ontology Construction Using LLMs as an External Layer of Memory, Verification, and Planning for Hybrid Intelligent Systems

This paper presents a hybrid architecture for intelligent systems in which large language models (LLMs) are extended with an external ontological memory layer. Instead of relying solely on parametric knowledge and vector-based retrieval (RAG), the proposed approach constructs and maintains a structured knowledge graph using RDF/OWL representations, enabling persistent, verifiable, and semantically grounded reasoning. The core contribution is an automated pipeline for ontology construction from heterogeneous data sources, including documents, APIs, and dialogue logs. The system performs entity recognition, relation extraction, normalization, and triple generation, followed by validation using SHACL and OWL constraints, and continuous graph updates. During inference, LLMs operate over a combined context that integrates vector-based retrieval with graph-based reasoning and external tool interaction. Experimental observations on planning tasks, including the Tower of Hanoi benchmark, indicate that ontology augmentation improves performance in multi-step reasoning scenarios compared to baseline LLM systems. In addition, the ontology layer enables formal validation of generated outputs, transforming the system into a generation-verification-correction pipeline. The proposed architecture addresses key limitations of current LLM-based systems, including lack of long-term memory, weak structural understanding, and limited reasoning capabilities. It provides a foundation for building agent-based systems, robotics applications, and enterprise AI solutions that require persistent knowledge, explainability, and reliable decision-making.

翻译：本文提出了一种混合智能系统架构，其中大型语言模型（LLMs）通过扩展外部本体记忆层得到增强。所提方法不单纯依赖参数化知识与基于向量的检索（RAG），而是利用RDF/OWL表示形式构建并维护结构化知识图谱，从而实现持久、可验证且具有语义基础的推理。核心贡献在于构建了一条从异构数据源（包括文档、API和对话日志）自动构建本体的流水线。系统执行实体识别、关系抽取、归一化及三元组生成，随后使用SHACL和OWL约束进行验证，并持续更新图谱。在推理阶段，LLMs在整合向量检索、图谱推理与外部工具交互的联合上下文上运行。对规划任务的实验观察（包括汉诺塔基准测试）表明，与基线LLM系统相比，本体增强能提升多步推理场景的性能。此外，本体层能够实现生成结果的形式化验证，将系统转化为生成-验证-修正的流水线。该架构解决了当前LLM系统存在的主要限制，包括缺乏长期记忆、结构理解薄弱及推理能力有限等问题。它为构建需要持久知识、可解释性及可靠决策的智能体系统、机器人应用及企业级AI解决方案提供了基础。