The advent of Large Language Models (LLMs) is transforming search engines into conversational AI search products, primarily using Retrieval-Augmented Generation (RAG) on web corpora. However, this paradigm has significant industrial limitations. Traditional RAG approaches struggle with real-time needs and structured queries that require accessing dynamically generated content like ticket availability or inventory. Limited to indexing static pages, search engines cannot perform the interactive queries needed for such time-sensitive data. Academic research has focused on optimizing RAG for static content, overlooking complex intents and the need for dynamic sources like databases and real-time APIs. To bridge this gap, we introduce TURA (Tool-Augmented Unified Retrieval Agent for AI Search), a novel three-stage framework that combines RAG with agentic tool-use to access both static content and dynamic, real-time information. TURA has three key components: an Intent-Aware Retrieval module to decompose queries and retrieve information sources encapsulated as Model Context Protocol (MCP) Servers, a DAG-based Task Planner that models task dependencies as a Directed Acyclic Graph (DAG) for optimal parallel execution, and a lightweight Distilled Agent Executor for efficient tool calling. TURA is the first architecture to systematically bridge the gap between static RAG and dynamic information sources for a world-class AI search product. Serving tens of millions of users, it leverages an agentic framework to deliver robust, real-time answers while meeting the low-latency demands of a large-scale industrial system.
翻译:大型语言模型(LLM)的出现正将搜索引擎转变为对话式AI搜索产品,其核心是基于网络语料库的检索增强生成(RAG)技术。然而,该范式存在显著的工业应用局限。传统RAG方法难以满足实时性需求及结构化查询要求,例如需要访问票务余量或库存等动态生成内容。由于仅限于索引静态页面,搜索引擎无法执行此类时效性数据所需的交互式查询。学术研究长期聚焦于针对静态内容的RAG优化,却忽视了复杂查询意图以及对数据库和实时API等动态源的需求。为弥合这一鸿沟,我们提出TURA(面向AI搜索的工具增强统一检索智能体)——一种融合RAG与智能体工具调用机制的三阶段创新框架,可同时访问静态内容与动态实时信息。TURA包含三个核心组件:用于分解查询并检索封装为模型上下文协议(MCP)服务器的信息源的意图感知检索模块;将任务依赖关系建模为有向无环图(DAG)以实现最优并行执行的基于DAG的任务规划器;以及用于高效工具调用的轻量化蒸馏智能体执行器。TURA是首个系统化弥合静态RAG与动态信息源之间差距的架构,旨在构建世界级AI搜索产品。该系统已服务数千万用户,通过智能体框架在满足大规模工业系统低延迟要求的同时,提供鲁棒的实时答案。