The integration of Large Language Models (LLMs) into the public health policy sector offers a transformative approach to navigating the vast repositories of regulatory guidance maintained by agencies such as the Centers for Disease Control and Prevention (CDC). However, the propensity for LLMs to generate hallucinations, defined as plausible but factually incorrect assertions, presents a critical barrier to the adoption of these technologies in high-stakes environments where information integrity is non-negotiable. This empirical evaluation explores the effectiveness of Retrieval-Augmented Generation (RAG) architectures in mitigating these risks by grounding generative outputs in authoritative document context. Specifically, this study compares a baseline Vanilla LLM against Basic RAG and Advanced RAG pipelines utilizing cross-encoder re-ranking. The experimental framework employs a Mistral-7B-Instruct-v0.2 model and an all-MiniLM-L6-v2 embedding model to process a corpus of official CDC policy analytical frameworks and guidance documents. The analysis measures the impact of two distinct chunking strategies, recursive character-based and token-based semantic splitting, on system accuracy, measured through faithfulness and relevance scores across a curated set of complex policy scenarios. Quantitative findings indicate that while Basic RAG architectures provide a substantial improvement in faithfulness (0.621) over Vanilla baselines (0.347), the Advanced RAG configuration achieves a superior faithfulness average of 0.797. These results demonstrate that two-stage retrieval mechanisms are essential for achieving the precision required for domain-specific policy question answering, though structural constraints in document segmentation remain a significant bottleneck for multi-step reasoning tasks.
翻译:将大型语言模型(LLM)整合至公共卫生政策领域,为管理疾病控制与预防中心(CDC)等机构维护的海量监管指导文件库提供了一种变革性方法。然而,LLM倾向于产生幻觉(即看似合理但事实错误的断言),这在信息完整性不容妥协的高风险环境中构成了技术应用的关键障碍。本实证研究通过将生成式输出锚定于权威文档上下文,探讨检索增强生成(RAG)架构在降低此类风险方面的有效性。具体而言,本研究对比了基线Vanilla LLM与采用交叉编码器重排序机制的Basic RAG及Advanced RAG流程。实验框架采用Mistral-7B-Instruct-v0.2模型与all-MiniLM-L6-v2嵌入模型处理CDC政策分析框架及指导文档语料库,通过针对复杂政策场景构建的忠实度与相关性评分,量化分析两种分块策略(基于递归字符的分割与基于标记的语义分割)对系统准确性的影响。定量结果表明:Basic RAG架构在忠实度(0.621)上较Vanilla基线(0.347)有显著提升,而Advanced RAG配置则达到0.797的优异忠实度均值。这些发现证明,双阶段检索机制对于实现领域特定政策问答所需的精确性至关重要,但文档分割中的结构约束仍是多步推理任务的主要瓶颈。