This paper presents an analysis of open-source large language models (LLMs) and their application in Retrieval-Augmented Generation (RAG) tasks, specific for enterprise-specific data sets scraped from their websites. With the increasing reliance on LLMs in natural language processing, it is crucial to evaluate their performance, accessibility, and integration within specific organizational contexts. This study examines various open-source LLMs, explores their integration into RAG frameworks using enterprise-specific data, and assesses the performance of different open-source embeddings in enhancing the retrieval and generation process. Our findings indicate that open-source LLMs, combined with effective embedding techniques, can significantly improve the accuracy and efficiency of RAG systems, offering a viable alternative to proprietary solutions for enterprises.
翻译:本文分析了开源大语言模型及其在检索增强生成任务中的应用,特别针对从企业网站抓取的企业特定数据集。随着自然语言处理领域对LLM的依赖日益加深,评估其在特定组织环境中的性能、可访问性和集成能力至关重要。本研究检验了多种开源LLM,探讨了如何利用企业特定数据将其集成到RAG框架中,并评估了不同开源嵌入技术在增强检索与生成过程中的表现。我们的研究结果表明,开源LLM结合有效的嵌入技术,能够显著提升RAG系统的准确性和效率,为企业提供了替代专有解决方案的可行选择。