Retrieval-augmented question-answering systems combine retrieval techniques with large language models to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQA, an open-source toolkit that features a wide selection of model training algorithms, evaluation methods, and deployment tools curated from the latest research. As a showcase, we build QA systems using online documentation obtained from Databricks and Faire's websites. We find 7B-models trained and deployed using LocalRQA reach a similar performance compared to using OpenAI's text-ada-002 and GPT-4-turbo.
翻译:检索增强问答系统结合了检索技术与大型语言模型,能以更准确且信息丰富的方式提供答案。现有许多工具包允许用户通过现成模型快速构建此类系统,但在支持研究者和开发者定制模型训练、测试及部署流程方面仍显不足。我们提出LocalRQA,这是一个开源工具包,整合了最新研究中精选的多种模型训练算法、评估方法和部署工具。作为示范,我们利用从Databricks和Faire网站获取的在线文档构建了问答系统。实验表明,使用LocalRQA训练和部署的70亿参数模型,其性能与采用OpenAI的text-ada-002和GPT-4-turbo相当。