This paper presents an agentic multimodal retrieval-augmented generation (RAG) framework for domain-specific literature reasoning, instantiated on a curated corpus of several thousand papers in intelligent tires, vehicle dynamics, vehicle control, sensing, estimation, and machine learning. Unlike conventional single-pass RAG systems, the proposed architecture uses an autonomous, evidence-gated pipeline that classifies query intent, generates separate text and visual query rewrites, performs hybrid text retrieval with FAISS and BM25 followed by cross-encoder reranking, expands evidence through graph-guided chunk traversal over a Neo4j knowledge graph, and retrieves visual document evidence using ColSmol late-interaction embeddings with MUVERA fixed-dimensional encoding, approximate nearest-neighbor search, and MaxSim reranking. The framework scores evidence sufficiency using a 100-point rubric with hybrid rule-based/LLM review, retries retrieval through drift-guarded reformulation, searches external academic databases through optimize--search--vet loops, merges and deduplicates multimodal evidence, verifies citation integrity, and generates cited answers through Planner, Researcher, Writer, and Critic agents with self-correcting revision. Key contributions include: (i) a scalable multimodal retrieval architecture combining text, graph, and visual evidence over 40,000 document pages; (ii) an interpretable evidence sufficiency and retry mechanism; (iii) a multi-agent generation pipeline with evidence mapping and critic-driven revision; (iv) a domain knowledge graph with LLM-based entity extraction, OpenAlex author validation, and intra-corpus citation resolution; and (v) a route-dependent external search architecture for targeted literature expansion. The result is a practical, evidence-gated, multimodal agentic RAG architecture for technical reasoning over specialized research corpora.
翻译:本文提出了一种面向特定领域文献推理的多智能体检索增强生成(RAG)框架,该框架在包含数千篇智能轮胎、车辆动力学、车辆控制、传感、估计及机器学习领域论文的精选语料库上实现。与传统的单次检索RAG系统不同,所提出的架构采用自主的、证据门控流水线:分类查询意图,生成分离的文本与视觉查询改写,通过FAISS与BM25进行混合文本检索并接续交叉编码器重排,通过Neo4j知识图谱上的图引导分块遍历扩展证据,利用ColSmol延迟交互嵌入配合MUVERA固定维度编码、近似最近邻搜索及MaxSim重排检索视觉文献证据。该框架使用100分制评估标准(混合规则与LLM审查)对证据充分性进行评分,通过漂移防护式查询改写重试检索,通过优化-搜索-验证循环检索外部学术数据库,合并并去重多模态证据,验证引用完整性,并通过规划器、研究员、写作者及评审员智能体(具有自修正能力)生成带引用的答案。核心贡献包括:(i) 一种结合文本、图谱与视觉证据(覆盖40,000页文献)的可扩展多模态检索架构;(ii) 一种可解释的证据充分性与重试机制;(iii) 一种包含证据映射与评审驱动修正的多智能体生成流水线;(iv) 一种结合LLM实体抽取、OpenAlex作者验证及语料内引用解析的领域知识图谱;(v) 一种面向定向文献扩展的路径依赖式外部搜索架构。最终形成一种适用于专业研究语料库技术推理的实用化、证据门控、多智能体RAG架构。