AI Agent for Reverse-Engineering Legacy Finite-Difference Code and Translating to Devito

To facilitate the transformation of legacy finite difference implementations into the Devito environment, this study develops an integrated AI agent framework. Retrieval-Augmented Generation (RAG) and open-source Large Language Models are combined through multi-stage iterative workflows in the system's hybrid LangGraph architecture. The agent constructs an extensive Devito knowledge graph through document parsing, structure-aware segmentation, extraction of entity relationships, and Leiden-based community detection. GraphRAG optimisation enhances query performance across semantic communities that include seismic wave simulation, computational fluid dynamics, and performance tuning libraries. A reverse engineering component derives three-level query strategies for RAG retrieval through static analysis of Fortran source code. To deliver precise contextual information for language model guidance, the multi-stage retrieval pipeline performs parallel searching, concept expansion, community-scale retrieval, and semantic similarity analysis. Code synthesis is governed by Pydantic-based constraints to guarantee structured outputs and reliability. A comprehensive validation framework integrates conventional static analysis with the G-Eval approach, covering execution correctness, structural soundness, mathematical consistency, and API compliance. The overall agent workflow is implemented on the LangGraph framework and adopts concurrent processing to support quality-based iterative refinement and state-aware dynamic routing. The principal contribution lies in the incorporation of feedback mechanisms motivated by reinforcement learning, enabling a transition from static code translation toward dynamic and adaptive analytical behavior.

翻译：为促进遗留有限差分实现向Devito环境的迁移，本研究开发了一种集成式AI智能体框架。系统通过混合LangGraph架构中的多阶段迭代工作流，将检索增强生成技术与开源大语言模型相结合。该智能体通过文档解析、结构感知分割、实体关系提取及基于Leiden算法的社区检测，构建了涵盖Devito框架的扩展知识图谱。GraphRAG优化提升了跨语义社区的查询性能，这些社区涵盖地震波模拟、计算流体力学及性能调优库等领域。逆向工程组件通过对Fortran源代码的静态分析，为RAG检索推导出三级查询策略。为向语言模型提供精准的上下文指导信息，多阶段检索流水线执行并行搜索、概念扩展、社区级检索及语义相似性分析。代码合成过程采用基于Pydantic的约束机制进行管控，确保结构化输出与可靠性。综合验证框架将传统静态分析与G-Eval方法相结合，覆盖执行正确性、结构完整性、数学一致性及API合规性等维度。整体智能体工作流基于LangGraph框架实现，采用并发处理机制以支持基于质量的迭代优化及状态感知的动态路由。本研究的核心贡献在于引入受强化学习启发的反馈机制，实现了从静态代码翻译到动态自适应分析行为的范式转变。