Recent research on dialogue state tracking (DST) focuses on methods that allow few- and zero-shot transfer to new domains or schemas. However, performance gains heavily depend on aggressive data augmentation and fine-tuning of ever larger language model based architectures. In contrast, general purpose language models, trained on large amounts of diverse data, hold the promise of solving any kind of task without task-specific training. We present preliminary experimental results on the ChatGPT research preview, showing that ChatGPT achieves state-of-the-art performance in zero-shot DST. Despite our findings, we argue that properties inherent to general purpose models limit their ability to replace specialized systems. We further theorize that the in-context learning capabilities of such models will likely become powerful tools to support the development of dedicated and dynamic dialogue state trackers.
翻译:近期关于对话状态跟踪(DST)的研究聚焦于支持零样本与小样本迁移至新领域或新模式的方法。然而,其性能提升高度依赖激进的数据增强技术及对日益庞大的基于语言模型的架构进行微调。相比之下,基于海量多元数据训练的通用语言模型有望在无需任务特定训练的情况下解决各类任务。我们展示了针对ChatGPT研究预览版的初步实验结果,表明ChatGPT在零样本DST中达到了当前最优性能。尽管取得这一发现,我们仍认为通用模型固有的某些属性限制了其替代专业系统的能力。我们进一步推测,此类模型的上下文学习能力很可能成为支持开发专用且动态的对话状态跟踪器的强大工具。