Large Language Models (LLMs) combined with Retrieval-Augmented Generation (RAG) and knowledge graphs offer new opportunities for interacting with engineering diagrams such as Piping and Instrumentation Diagrams (P&IDs). However, directly processing raw images or smart P&ID files with LLMs is often costly, inefficient, and prone to hallucinations. This work introduces ChatP&ID, an agentic framework that enables grounded and cost-effective natural-language interaction with P&IDs using Graph Retrieval-Augmented Generation (GraphRAG), a paradigm we refer to as GraphRAG for engineering diagrams. Smart P&IDs encoded in the DEXPI standard are transformed into structured knowledge graphs, which serve as the basis for graph-based retrieval and reasoning by LLM agents. This approach enables reliable querying of engineering diagrams while significantly reducing computational cost. Benchmarking across commercial LLM APIs (OpenAI, Anthropic) demonstrates that graph-based representations improve accuracy by 18% over raw image inputs and reduce token costs by 85% compared to directly ingesting smart P&ID files. While small open-source models still struggle to interpret knowledge graph formats and structured engineering data, integrating them with VectorRAG and PathRAG improves response accuracy by up to 40%. Notably, GPT-5-mini combined with ContextRAG achieves 91% accuracy at a cost of only $0.004 per task. The resulting ChatP&ID interface enables intuitive natural-language interaction with complex engineering diagrams and lays the groundwork for AI-assisted process engineering tasks such as Hazard and Operability Studies (HAZOP) and multi-agent analysis.
翻译:大语言模型(LLM)结合检索增强生成(RAG)与知识图谱,为管道及仪表流程图(P&ID)等工程图交互提供了新机遇。然而,直接处理原始图像或智能P&ID文件往往代价高昂、效率低下且易产生幻觉。本文提出ChatP&ID——一种智能体框架,通过图检索增强生成(GraphRAG)实现基于事实且经济高效的工程图自然语言交互,我们将此范式称为工程图域的GraphRAG。采用DEXPI标准编码的智能P&ID文件被转换为结构化知识图谱,作为LLM智能体执行图检索与推理的基础。该方法在显著降低计算成本的同时,实现了对工程图的可靠查询。基于商业LLM接口(OpenAI、Anthropic)的基准测试表明:相较于原始图像输入,图表示方法将准确率提升18%;相比直接解析智能P&ID文件,令牌消耗降低85%。尽管小型开源模型仍难以有效解析知识图谱格式和结构化工程数据,但集成VectorRAG与PathRAG后,其响应准确率最高可提升40%。值得注意的是,GPT-5-mini结合ContextRAG在每任务仅需0.004美元的成本下实现了91%的准确率。最终构建的ChatP&ID界面实现了与复杂工程图的直观自然语言交互,为危险与可操作性分析(HAZOP)及多智能体协同分析等AI辅助流程工程任务奠定了基础。