Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."
翻译:基于生成式人工智能的企业聊天机器人正成为提升员工生产力的关键应用。检索增强生成(RAG)、大语言模型(LLM)以及如Langchain和Llamaindex等编排框架对于构建此类聊天机器人至关重要。然而,创建高效的企业聊天机器人具有挑战性,需要精密的RAG流水线工程。这包括微调嵌入模型和LLM、从向量数据库提取文档、查询重述、结果重排序、提示设计、遵守文档访问控制、提供简洁响应、包含参考文献、保护个人信息以及构建编排智能体。基于我们在三个NVIDIA聊天机器人(分别用于IT/人力资源福利、财务收益和通用内容)的开发经验,我们提出了一个构建基于RAG的聊天机器人的框架。我们的贡献包括三个方面:引入FACTS框架(新鲜度、架构、成本、测试、安全性),提出十五个RAG流水线控制点,并提供关于大尺寸与小尺寸LLM之间准确率-延迟权衡的实证结果。据我们所知,这是首篇全面阐述构建安全企业级聊天机器人的影响因素及解决方案的论文。