In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This research presents a systematic empirical framework for metadata enrichment using large language models (LLMs) to enhance document retrieval in Retrieval-Augmented Generation (RAG) systems. Our approach employs a structured pipeline that dynamically generates meaningful metadata for document segments, substantially improving their semantic representations and retrieval accuracy. Through a controlled 3 X 3 experimental matrix, we compare three chunking strategies -- semantic, recursive, and naive -- and evaluate their interactions with three embedding techniques -- content-only, TF-IDF weighted, and prefix-fusion -- isolating the contribution of each component through ablation analysis. The results demonstrate that metadata-enriched approaches consistently outperform content-only baselines, with recursive chunking paired with TF-IDF weighted embeddings yielding 82.5% precision and naive chunking with prefix-fusion achieving the strongest ranking quality (NDCG 0.813). Our evaluation employs cross-encoder reranking for silver-standard ground truth generation, with statistical significance confirmed via Bonferroni-corrected paired t-tests. These findings confirm that metadata enrichment improves vector space organization and retrieval effectiveness while maintaining sub-30 ms P95 latency, providing a quantitative decision framework for deploying high-performance, scalable RAG systems in enterprise settings.
翻译:在企业场景中,从庞大复杂的知识库中高效检索相关信息,对于提升运营生产力和支持精准决策至关重要。本研究提出了一种系统化的实证框架,通过利用大语言模型(LLM)进行元数据增强来提升检索增强生成(RAG)系统中的文档检索能力。该方法采用结构化流水线,动态生成文档片段的语义化元数据,显著改善了其语义表示与检索准确率。通过受控的3X3实验矩阵,我们比较了三种分块策略——语义分块、递归分块与朴素分块,并评估了它们与三种嵌入技术的交互效应:纯内容嵌入、TF-IDF加权嵌入及前缀融合嵌入,同时借助消融分析分离了各组件的贡献。实验结果表明,元数据增强方法始终优于纯内容基线;其中递归分块结合TF-IDF加权嵌入的方案实现了82.5%的精确率,而朴素分块搭配前缀融合嵌入的方案则取得了最优排序质量(NDCG 0.813)。本研究采用交叉编码器重排序生成银标准真实标签,并通过经Bonferroni校正的配对t检验确认统计显著性。这些发现证实了元数据增强在优化向量空间组织与检索效能的同时,能将P95延迟控制在30毫秒以内,从而为企业部署高性能、可扩展的RAG系统提供了量化决策框架。