Task-Oriented Dialogue (TOD) systems are designed to carry out specific tasks by tracking dialogue states and generating appropriate responses to help users achieve defined goals. Recently, end-to-end dialogue models pre-trained based on large datasets have shown promising performance in the conversational system. However, they share the same parameters to train tasks of the dialogue system (NLU, DST, NLG), so debugging each task is challenging. Also, they require a lot of effort to fine-tune large parameters to create a task-oriented chatbot, making it difficult for non-experts to handle. Therefore, we intend to train relatively lightweight and fast models compared to PLM. In this paper, we propose an End-to-end TOD system with Task-Optimized Adapters which learn independently per task, adding only small number of parameters after fixed layers of pre-trained network. We also enhance the performance of the DST and NLG modules through reinforcement learning, overcoming the learning curve that has lacked at the adapter learning and enabling the natural and consistent response generation that is appropriate for the goal. Our method is a model-agnostic approach and does not require prompt-tuning as only input data without a prompt. As results of the experiment, our method shows competitive performance on the MultiWOZ benchmark compared to the existing end-to-end models. In particular, we attain state-of-the-art performance on the DST task of 2.2 dataset.
翻译:任务导向对话(TOD)系统旨在通过跟踪对话状态并生成适当响应来执行特定任务,从而帮助用户实现既定目标。近年来,基于大规模数据集预训练的端到端对话模型在对话系统中展现出令人瞩目的性能。然而,这些模型共享相同参数来训练对话系统的任务(自然语言理解、对话状态跟踪、自然语言生成),因此调试每个任务颇具挑战。此外,微调大量参数以创建任务导向聊天机器人需要大量精力,这使得非专家难以处理。为此,我们计划训练相较于预训练语言模型更轻量且更快速的模型。本文提出一种带有任务优化适配器的端到端TOD系统,这些适配器为每个任务独立学习,仅在预训练网络的固定层后添加少量参数。我们还通过强化学习增强对话状态跟踪和自然语言生成模块的性能,克服了适配器学习中缺乏的学习曲线,从而生成符合目标、自然且一致的响应。我们的方法是模型无关的,无需提示调优,仅使用无提示的输入数据。实验结果表明,与现有端到端模型相比,我们的方法在MultiWOZ基准测试上展现出具有竞争力的性能。特别地,我们在2.2数据集的对话状态跟踪任务上取得了最新最优性能。