Retrieval-Augmented Generation (RAG) is widely used to ground large language models in external knowledge sources. However, when applied to heterogeneous corpora and multi-step queries, Naive RAG pipelines often degrade in quality due to flat knowledge representations and the absence of explicit workflows. In this work, we introduce DCD (Domain-Collection-Document), a domain-oriented design to structure knowledge and control query processing in RAG systems without modifying the underlying language model. The proposed approach relies on a hierarchical decomposition of the information space and multi-stage routing based on structured model outputs, enabling progressive restriction of both retrieval and generation scopes. The architecture is complemented by smart chunking, hybrid retrieval, and integrated validation and generation guardrail mechanisms. We describe the DCD architecture and workflow and discuss evaluation results on synthetic evaluation dataset, highlighting their impact on robustness, factual accuracy, and answer relevance in applied RAG scenarios.
翻译:检索增强生成(RAG)广泛应用于将大语言模型与外部知识源进行结合。然而,当应用于异构语料库和多步骤查询时,朴素RAG流程常因扁平化知识表示和缺乏显式工作流而质量下降。本文提出DCD(领域-集合-文档)这一面向领域的设计方法,在无需修改底层语言模型的前提下,实现RAG系统中的知识结构化与查询处理控制。所提方法基于信息空间的层次化分解与结构化模型输出的多级路由机制,能够逐步限制检索与生成范围。该架构辅以智能分块、混合检索、集成验证与生成护栏机制。我们描述了DCD架构与工作流程,并在合成评估数据集上讨论评估结果,突出其在实用RAG场景中对鲁棒性、事实准确性和答案相关性的影响。