Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we investigate whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems. We conduct a controlled experiment across four domains (editorial, legal, travel, e-commerce) using Vertex AI Vector Search 2.0 for retrieval and the Google Agent Development Kit (ADK) for agentic reasoning. Our experimental design tests seven conditions: three document representations (plain HTML, HTML with JSON-LD, and an enhanced agentic-optimized entity page) crossed with two retrieval modes (standard RAG and agentic RAG with multi-hop link traversal), plus an Enhanced+ condition that adds rich navigational affordances and entity interlinking. Our results reveal that while JSON-LD markup alone provides only modest improvements, our enhanced entity page format, incorporating llms.txt-style agent instructions, breadcrumbs, and neural search capabilities, achieves substantial gains: +29.6% accuracy improvement for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant, with richer navigational affordances, achieves the highest absolute scores (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant. We release our dataset, evaluation framework, and enhanced entity page templates to support reproducibility.

翻译：检索增强生成系统通常将文档视为扁平文本，忽略了知识图谱所提供的结构化元数据和关联关系。本文研究结构化关联数据——特别是Schema.org标记和由关联数据平台提供的可解引用实体页面——能否在标准型和智能体型检索增强生成系统中提升检索准确性与答案质量。我们在四个领域（新闻编辑、法律、旅游、电子商务）开展对照实验，使用Vertex AI Vector Search 2.0进行检索，并采用Google Agent Development Kit实现智能体推理。实验设计测试七种条件：三种文档表示形式（纯HTML、带JSON-LD的HTML、增强型智能体优化实体页面）与两种检索模式（标准RAG和具备多跳链接遍历能力的智能体RAG）交叉组合，另增设包含丰富导航功能和实体互连的Enhanced+条件。实验结果表明：虽然单独的JSON-LD标记仅带来有限改进，但融合llms.txt格式智能体指令、面包屑导航和神经搜索能力的增强型实体页面格式实现了显著提升——标准RAG准确率提高29.6%，完整智能体流程提升29.8%。具备更丰富导航功能的Enhanced+变体获得最高绝对分数（准确度：4.85/5，完整度：4.55/5），但其相对于基础增强格式的增量增益未达到统计显著性。我们公开了数据集、评估框架和增强型实体页面模板以支持研究复现。