Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning

Task-oriented dialogue (TOD) system is designed to accomplish user-defined tasks through dialogues. The TOD system has progressed towards end-to-end modeling by leveraging pre-trained large language models. Fine-tuning the pre-trained language models using only supervised learning leads to the exposure bias and token loss problem and it deviates the models from completing the user's task. To address these issues, we propose a TOD system that leverages a unified pre-trained language model, GPT2, as a base model. It is optimized using supervised learning and reinforcement learning (RL). The issues in the TOD system are mitigated using a non-differentiable reward function. The reward is calculated using the weighted sum of the success rate and BLEU evaluation metrics. The success rate and BLEU metrics in reward calculation guide the language model for user task completion while ensuring a coherent and fluent response. Our model is acquired by fine-tuning a pre-trained model on the dialogue-session level which comprises user utterance, belief state, system act, and system response. Experimental results on MultiWOZ2.1 demonstrate that our model increases the inform rate by 1.60% and the success rate by 3.17% compared to the baseline.

翻译：任务型对话系统旨在通过对话完成用户定义的任务。借助预训练大语言模型，任务型对话系统已朝着端到端建模方向发展。仅使用监督学习对预训练语言模型进行微调会导致暴露偏差和词元损失问题，并使模型偏离完成用户任务的目标。为解决这些问题，我们提出一种任务型对话系统，该系统以统一的预训练语言模型GPT2作为基础模型，并采用监督学习和强化学习进行优化。通过使用不可微分的奖励函数缓解任务型对话系统中的问题。奖励通过成功率与BLEU评估指标的加权和计算。奖励计算中的成功率和BLEU指标引导语言模型完成用户任务，同时确保生成连贯流畅的响应。我们的模型通过在对话会话级别（包含用户话语、信念状态、系统行为和系统响应）对预训练模型进行微调获得。在MultiWOZ2.1数据集上的实验结果表明，与基线相比，我们的模型将信息提供率提升了1.60%，成功率提升了3.17%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/