Intelligent Multimodal Retrieval and Reasoning for Geospatial Knowledge Discovery on the I-GUIDE Platform

Geospatial knowledge discovery increasingly requires search across heterogeneous artifacts: datasets, maps, notebooks, software, publications, and the provenance links among them. Conventional geoportals support metadata and spatial filtering, but they rarely provide semantic retrieval, graph-aware provenance traversal, and conversational synthesis in one integrated system. This paper presents I-GUIDE Smart Search, a production multimodal geospatial retrieval-augmented generation (RAG) system embedded in the I-GUIDE Platform, and reports on its design, deployment, and evaluation. The system combines production-maintained OpenSearch keyword, vector, and spatial indexes with a Neo4j knowledge graph and an iterative RAG pipeline for memory-aware query augmentation, reasoning, retrieval-method routing, relevance grading, grounded generation, hallucination and relevance checking. In a single-A100 RAG deployment, I-GUIDE Smart Search supports interactive use up to about 100 concurrent simulated users, reaching 4.4 requests per second with p50 latency near 25 seconds despite 20-50 LLM calls per query. For answer quality, we evaluate a four-category benchmark of 170 unique human-filtered user-facing queries, together with ten intent-specific probe sets generated from the deployed indexes and graph. Smart Search improves retrieved evidence coverage and judged answer quality over non-retrieval and naive-RAG baselines, with the clearest gains on exact-identifier, spatially constrained, simple-recommendation, and domain-specific factual queries requiring current indexed evidence. We distill transferable deployment lessons for spatial RAG systems, covering spatial metadata quality, graph provenance, retrieval routing, interface contracts, refusal-aware evaluation, latency-cost tradeoffs, and the role of the user interface in deployed geospatial cyberinfrastructure.

翻译：地理空间知识发现日益需要跨异构工件的搜索：数据集、地图、笔记本、软件、出版物及其间的溯源关联。传统地理门户支持元数据与空间过滤，但鲜能在单一集成系统中提供语义检索、图感知溯源遍历及对话式综合。本文提出I-GUIDE智能搜索——一个嵌入I-GUIDE平台的多模态地理空间检索增强生成（RAG）生产系统，并报告其设计、部署与评估。该系统将生产维护的OpenSearch关键词、向量与空间索引，与Neo4j知识图谱及迭代RAG管线相结合，实现记忆感知的查询增强、推理、检索方法路由、相关性评分、有依据生成、幻觉与相关性检测。在单A100 RAG部署中，I-GUIDE智能搜索支持多达约100个并发模拟用户的交互使用，尽管每次查询需20-50次LLM调用，仍可达每秒4.4次请求，p50延迟约25秒。答案质量方面，我们评估了包含170个人工筛选用户查询的四类基准测试集，以及基于已部署索引与图谱生成的十个意图特定探测集。相比无检索与朴素RAG基线，智能搜索提升了检索证据覆盖范围与判断答案质量，在精确标识符、空间约束、简单推荐及需要当前索引证据的领域特定事实查询上增益最为显著。我们提炼了空间RAG系统的可迁移部署经验，涵盖空间元数据质量、图谱溯源、检索路由、接口契约、拒绝感知评估、延迟-成本权衡，以及用户界面在已部署地理空间网络基础设施中的作用。