Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques. Yet, two significant challenges persist. First, most systems primarily utilize the latest turn's state label for the generator. This practice overlooks the comprehensive value of state labels in boosting the model's understanding for future generations. Second, an overreliance on generated policy often leads to error accumulation, resulting in suboptimal responses when adhering to incorrect actions. To combat these challenges, we propose turn-level multi-task objectives for the encoder. With the guidance of essential information from labeled intermediate states, we establish a more robust representation for both understanding and generation. For the decoder, we introduce an action tree-based scheduled sampling technique. Specifically, we model the hierarchical policy as trees and utilize the similarity between trees to sample negative policy based on scheduled sampling, hoping the model to generate invariant responses under perturbations. This method simulates potential pitfalls by sampling similar negative policy, bridging the gap between task-oriented dialog training and inference. Among methods without continual pre-training, our approach achieved state-of-the-art (SOTA) performance on the MultiWOZ dataset series and was also competitive with pre-trained SOTA methods.
翻译:任务导向型对话系统因对话预训练技术的发展取得了显著进展,但仍面临两大挑战。首先,多数系统主要基于最新轮次的状态标签生成回复,忽略了状态标签在提升模型对未来生成过程理解方面的综合价值。其次,过度依赖生成策略常导致错误累积,当模型遵循错误动作时会产生次优回复。针对这些问题,我们提出编码器的轮次级多任务目标:通过利用标注中间状态中的关键信息,为理解与生成任务建立更鲁棒的表示。对于解码器,我们引入基于动作树的计划采样技术——将层次化策略建模为树结构,利用树之间的相似性依据计划采样策略抽取负样本策略,使模型在扰动下生成不变性回复。该方法通过采样相似负策略模拟潜在陷阱,缩小了任务导向型对话训练与推理之间的差距。在无需持续预训练的方法中,我们的方法在MultiWOZ数据集系列上达到了当前最优性能,并与基于预训练的SOTA方法具有可比性。