Task-Oriented Dialogue (TOD) systems are designed to carry out specific tasks by tracking dialogue states and generating appropriate responses to help users achieve defined goals. Recently, end-to-end dialogue models pre-trained based on large datasets have shown promising performance in the conversational system. However, they share the same parameters to train tasks of the dialogue system (NLU, DST, NLG), so debugging each task is challenging. Also, they require a lot of effort to fine-tune large parameters to create a task-oriented chatbot, making it difficult for non-experts to handle. Therefore, we intend to train relatively lightweight and fast models compared to PLM. In this paper, we propose an End-to-end TOD system with Task-Optimized Adapters which learn independently per task, adding only small number of parameters after fixed layers of pre-trained network. We also enhance the performance of the DST and NLG modules through reinforcement learning, overcoming the learning curve that has lacked at the adapter learning and enabling the natural and consistent response generation that is appropriate for the goal. Our method is a model-agnostic approach and does not require prompt-tuning as only input data without a prompt. As results of the experiment, our method shows competitive performance on the MultiWOZ benchmark compared to the existing end-to-end models. In particular, we attain state-of-the-art performance on the DST task of 2.2 dataset.
翻译:任务型对话(TOD)系统旨在通过跟踪对话状态并生成适当响应来执行特定任务,以帮助用户实现既定目标。近年来,基于大型数据集预训练的端到端对话模型在对话系统中展现出令人瞩目的性能。然而,这些模型采用相同参数训练对话系统的各项任务(自然语言理解NLU、对话状态跟踪DST、自然语言生成NLG),导致各任务的调试充满挑战。此外,微调大量参数以创建任务型聊天机器人需要投入大量精力,使得非专业人士难以处理。为此,我们旨在训练比预训练语言模型(PLM)更轻量且更快速的模型。本文提出了一种带有任务优化适配器的端到端任务型对话系统,该适配器在预训练网络固定层之后仅添加少量参数,即可针对每个任务进行独立学习。我们通过强化学习增强了DST和NLG模块的性能,克服了适配器学习中长期存在的学习曲线不足问题,实现了与目标相符的自然一致响应生成。我们的方法是一种模型无关的方法,且无需提示调优,仅需输入不含提示的数据即可。实验结果表明,与现有端到端模型相比,我们的方法在MultiWOZ基准测试中展现出具有竞争力的性能。特别地,我们在2.2数据集的DST任务上达到了当前最优水平。