Large language models (LLMs) with retrieval augmented-generation (RAG) have been the optimal choice for scalable generative AI solutions in the recent past. Although RAG implemented with AI agents (agentic-RAG) has been recently popularized, its suffers from unstable cost and unreliable performances for Enterprise-level data-practices. Most existing use-cases that incorporate RAG with LLMs have been either generic or extremely domain specific, thereby questioning the scalability and generalizability of RAG-LLM approaches. In this work, we propose a unique LLM-based system where multiple LLMs can be invoked to enable data authentication, user-query routing, data-retrieval and custom prompting for question-answering capabilities from Enterprise-data tables. The source tables here are highly fluctuating and large in size and the proposed framework enables structured responses in under 10 seconds per query. Additionally, we propose a five metric scoring module that detects and reports hallucinations in the LLM responses. Our proposed system and scoring metrics achieve >90% confidence scores across hundreds of user queries in the sustainability, financial health and social media domains. Extensions to the proposed extreme RAG architectures can enable heterogeneous source querying using LLMs.
翻译:近年来,采用检索增强生成技术的大语言模型已成为可扩展生成式人工智能解决方案的最优选择。尽管基于智能体实现的检索增强生成近期广受关注,但其在企业级数据实践中存在成本不稳定与性能不可靠的问题。现有结合大语言模型的检索增强生成应用场景大多具有通用性或极端领域特异性,这引发了关于该技术路径可扩展性与泛化能力的质疑。本研究提出一种独特的大语言模型系统架构,通过调用多个大语言模型实现企业数据表格的认证验证、用户查询路由、数据检索及定制化提示工程,从而构建问答能力。该系统针对规模庞大且波动剧烈的源数据表格,能够在单次查询10秒内生成结构化响应。此外,我们设计了一套五维指标评分模块,用于检测并报告大语言模型响应中的幻觉现象。在可持续发展、财务健康与社交媒体领域的数百条用户查询测试中,本系统与评分指标均取得超过90%的置信度评分。所提出的极端检索增强生成架构经扩展后,可支持基于大语言模型的异构数据源查询。