Zero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data. Existing works mainly study common data- or model-level augmentation methods to enhance the generalization but fail to effectively decouple the semantics of samples, limiting the zero-shot performance of DST. In this paper, we present a simple and effective "divide, conquer and combine" solution, which explicitly disentangles the semantics of seen data, and leverages the performance and robustness with the mixture-of-experts mechanism. Specifically, we divide the seen data into semantically independent subsets and train corresponding experts, the newly unseen samples are mapped and inferred with mixture-of-experts with our designed ensemble inference. Extensive experiments on MultiWOZ2.1 upon the T5-Adapter show our schema significantly and consistently improves the zero-shot performance, achieving the SOTA on settings without external knowledge, with only 10M trainable parameters1.
翻译:零样本迁移学习用于对话状态追踪(DST)有助于处理多种面向任务的对话领域,而无需收集领域内数据的成本。现有工作主要研究常见的数据级或模型级增强方法以增强泛化能力,但未能有效解耦样本的语义,从而限制了DST的零样本性能。本文提出了一种简单有效的"分割、征服与融合"解决方案,该方案显式地解耦了已见数据的语义,并利用专家混合机制提升了性能与鲁棒性。具体而言,我们将已见数据分割为语义无关的子集,并训练相应的专家;新出现的未见样本通过设计的集成推理机制,利用专家混合模型进行映射与推断。在T5-Adapter上对MultiWOZ2.1进行的大量实验表明,我们的方案显著且一致地提升了零样本性能,在无需外部知识的设置下达到了当前最优水平,且仅需1000万个可训练参数。