Retrieval-Augmented Generation (RAG) is gaining recognition as one of the key technological axes for next generation information retrieval, owing to its ability to mitigate the hallucination phenomenon in Large Language Models (LLMs)and effectively incorporate up-to-date information. However, specialized expertise is necessary to construct ahigh-quality retrieval system independently; moreover, RAGdemonstratesrelativelyslowerprocessing speeds compared to conventional pure retrieval systems because it involves both retrieval and generation stages. Accordingly, this study proposes SHRAG, a novel framework designed to facilitate the seamless integration of Information Retrieval and RAG while simultaneously securing precise retrieval performance. SHRAG utilizes a Large Language Model as a Query Strategist to automatically transform unstructured natural language queries into logically structured search queries, subsequently performing Boolean retrieval to emulate the search process of an expert human searcher. Furthermore, it incorporates multilingual query expansion and a multilingual embedding model, enabling it to perform efficient cross-lingual question answering within the multilingual dataset environment of the ScienceON Challenge. Experimental results demonstrate that the proposed method, combining logical retrieval capabilities and generative reasoning, can significantly enhance the accuracy and reliability of RAG systems. Furthermore, SHRAG movesbeyondconventionaldocument-centric retrieval methods, presenting the potential for a new search paradigm capable of providing direct and reliable responses to queries.
翻译:检索增强生成(RAG)因其能够缓解大型语言模型(LLM)中的幻觉现象并有效整合最新信息,正被公认为下一代信息检索的关键技术方向之一。然而,独立构建高质量检索系统需要专业知识;此外,由于同时涉及检索与生成阶段,RAG相较于传统纯检索系统处理速度相对较慢。为此,本研究提出SHRAG,一种新颖框架,旨在促进信息检索与RAG的无缝集成,同时确保精确的检索性能。SHRAG利用大型语言模型作为查询策略器,自动将非结构化自然语言查询转换为逻辑结构化的搜索查询,随后执行布尔检索以模拟专家人类搜索者的搜索过程。此外,该框架整合了多语言查询扩展与多语言嵌入模型,使其能够在ScienceON挑战赛的多语言数据集环境中实现高效的跨语言问答。实验结果表明,所提出的方法结合逻辑检索能力与生成式推理,能显著提升RAG系统的准确性与可靠性。进一步地,SHRAG超越了传统的以文档为中心的检索方法,展现出一种新型搜索范式的潜力,能够为查询提供直接且可靠的响应。