Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.
翻译:对话状态跟踪(DST)对于理解用户需求并在面向任务的对话中执行适当的系统行动至关重要。现有大多数DST方法设计用于在预定义的本体论内工作,并假设存在黄金领域标签,难以适应新的槽位值。尽管基于大语言模型(LLM)的系统显示出有前景的零样本DST性能,但它们要么需要大量计算资源,要么表现不及现有的完全训练系统,限制了其实用性。为解决这些限制,我们提出了一种零样本、开放词汇的系统,将领域分类和DST集成在单一流水线中。我们的方法包括将DST重新表述为能力较弱模型的问答任务,并为适应性更强的模型采用自优化提示。我们的系统不依赖于本体论中定义的固定槽位值,允许系统动态适应。我们将我们的方法与现有SOTA进行比较,结果表明在Multi-WOZ 2.1等数据集上,其联合目标准确率(JGA)比先前方法提高了高达20%,同时向LLM API的请求减少了高达90%。