Zero-shot dialogue state tracking (DST) seeks to enable dialogue systems to transition to unfamiliar domains without manual annotation or extensive retraining. Prior research has approached this objective by embedding prompts into language models (LMs). Common methodologies include integrating prompts at the input layer or introducing learnable variables at each transformer layer. Nonetheless, each strategy exhibits inherent limitations. Prompts integrated at the input layer risk underutilization, with their impact potentially diminishing across successive transformer layers. Conversely, the addition of learnable variables to each layer can complicate the training process and increase inference latency. To tackle the issues mentioned above, this paper proposes Dual Low-Rank Adaptation (DualLoRA), a plug-and-play architecture designed for zero-shot DST. DualLoRA incorporates two distinct Low-Rank Adaptation (LoRA) components, targeting both dialogue context processing and prompt optimization, to ensure the comprehensive influence of prompts throughout the transformer model layers. This is achieved without incurring additional inference latency, showcasing an efficient integration into existing architectures. Through rigorous evaluation on the MultiWOZ and SGD datasets, DualLoRA demonstrates notable improvements across multiple domains, outperforming traditional baseline methods in zero-shot settings. Our code is accessible at: \url{https://github.com/suntea233/DualLoRA}.
翻译:零样本对话状态追踪(DST)旨在使对话系统能够迁移到陌生领域,而无需人工标注或大量重新训练。先前的研究通过将提示嵌入语言模型(LM)来实现这一目标。常见方法包括在输入层集成提示或在每个Transformer层引入可学习变量。然而,每种策略都存在固有局限性。输入层集成的提示可能利用不足,其影响可能在后续Transformer层中逐渐减弱。相反,为每一层添加可学习变量会使训练过程复杂化并增加推理延迟。为解决上述问题,本文提出双低秩适配(DualLoRA),一种专为零样本DST设计的即插即用架构。DualLoRA包含两个独立的低秩适配(LoRA)组件,分别针对对话上下文处理和提示优化,以确保提示在整个Transformer模型层中的全面影响。这一实现无需额外推理延迟,展示了与现有架构的高效集成。通过在MultiWOZ和SGD数据集上的严格评估,DualLoRA在多个领域表现出显著改进,在零样本设置中优于传统基线方法。我们的代码公开于:\url{https://github.com/suntea233/DualLoRA}。