Conversational Question Answering (ConvQA) involves multiple subtasks, i) to understand incomplete questions in their context, ii) to retrieve relevant information, and iii) to generate answers. This work presents PRAISE, a pipeline-based approach for ConvQA that trains LLM adapters for each of the three subtasks. As labeled training data for individual subtasks is unavailable in practice, PRAISE learns from its own generations using the final answering performance as feedback signal without human intervention and treats intermediate information, like relevant evidence, as weakly labeled data. We apply Direct Preference Optimization by contrasting successful and unsuccessful samples for each subtask. In our experiments, we show the effectiveness of this training paradigm: PRAISE shows improvements per subtask and achieves new state-of-the-art performance on a popular ConvQA benchmark, by gaining 15.5 percentage points increase in precision over baselines.
翻译:对话式问答(ConvQA)涉及多个子任务:i)在上下文中理解不完整的问题,ii)检索相关信息,以及iii)生成答案。本研究提出了PRAISE,一种基于流程的ConvQA方法,为三个子任务分别训练LLM适配器。由于实践中缺乏针对单个子任务的标注训练数据,PRAISE利用自身生成结果进行学习,以最终回答性能作为反馈信号,无需人工干预,并将相关证据等中间信息视为弱标注数据。我们通过对每个子任务中成功与不成功的样本进行对比,应用直接偏好优化方法。实验结果表明,该训练范式具有显著效果:PRAISE在各个子任务上均表现出性能提升,并在一个流行的ConvQA基准测试中实现了新的最先进性能,其精确度较基线提高了15.5个百分点。