In this work, we introduce ChatQA, a family of conversational question answering (QA) models that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval-augmented generation in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.
翻译:本文介绍了ChatQA系列对话式问答模型,该系列模型达到了GPT-4级别的准确率。具体而言,我们提出了一种两阶段指令微调方法,可显著提升大型语言模型在零样本对话式问答任务中的表现。为处理对话式问答中的检索增强生成问题,我们在多轮问答数据集上对密集检索器进行了微调,在获得与最先进的查询改写模型相当结果的同时,大幅降低了部署成本。值得注意的是,我们的ChatQA-70B在10个对话式问答数据集上的平均得分(54.14 vs. 53.90)超过了GPT-4,且无需依赖OpenAI GPT模型生成的任何合成数据。