An ideal dialogue system requires continuous skill acquisition and adaptation to new tasks while retaining prior knowledge. Dialogue State Tracking (DST), vital in these systems, often involves learning new services and confronting catastrophic forgetting, along with a critical capability loss termed the "Value Selection Quandary." To address these challenges, we introduce the Reason-of-Select (RoS) distillation method by enhancing smaller models with a novel 'meta-reasoning' capability. Meta-reasoning employs an enhanced multi-domain perspective, combining fragments of meta-knowledge from domain-specific dialogues during continual learning. This transcends traditional single-perspective reasoning. The domain bootstrapping process enhances the model's ability to dissect intricate dialogues from multiple possible values. Its domain-agnostic property aligns data distribution across different domains, effectively mitigating forgetting. Additionally, two novel improvements, "multi-value resolution" strategy and Semantic Contrastive Reasoning Selection method, significantly enhance RoS by generating DST-specific selection chains and mitigating hallucinations in teachers' reasoning, ensuring effective and reliable knowledge transfer. Extensive experiments validate the exceptional performance and robust generalization capabilities of our method. The source code is provided for reproducibility.
翻译:理想的对话系统需要持续获取技能并适应新任务,同时保留已有知识。作为此类系统的关键组件,对话状态追踪(DST)常需学习新服务并面临灾难性遗忘问题,以及一种被称为"值选择困境"的关键能力丧失。为应对这些挑战,我们提出了选择理由(RoS)蒸馏方法,通过赋予较小模型一种新颖的"元推理"能力来增强其性能。元推理采用增强的多领域视角,在持续学习过程中整合来自领域特定对话的元知识片段,从而超越传统的单视角推理。领域引导过程增强了模型从多个可能值中解析复杂对话的能力。其领域无关特性使不同领域的数据分布趋于一致,有效缓解了遗忘问题。此外,我们提出的两项创新改进——"多值解析"策略和语义对比推理选择方法——通过生成DST特定的选择链并减少教师模型推理中的幻觉,显著增强了RoS方法,确保了高效可靠的知识迁移。大量实验验证了本方法卓越的性能和强大的泛化能力。我们已开源代码以确保可复现性。