超越单用户对话：评估大型语言模型在多用户对话状态追踪中的能力 (Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models)

Large language models (LLMs) have demonstrated remarkable performance in zero-shot dialogue state tracking (DST), reducing the need for task-specific training. However, conventional DST benchmarks primarily focus on structured user-agent conversations, failing to capture the complexities of real-world multi-user interactions. In this study, we assess the robustness of LLMs in multi-user DST while minimizing dataset construction costs. Inspired by recent advances in LLM-based data annotation, we extend an existing DST dataset by generating utterances of a second user based on speech act theory. Our methodology systematically incorporates a second user's utterances into conversations, enabling a controlled evaluation of LLMs in multi-user settings. Experimental results reveal a significant performance drop compared to single-user DST, highlighting the limitations of current LLMs in extracting and tracking dialogue states amidst multiple speakers. Our findings emphasize the need for future research to enhance LLMs for multi-user DST scenarios, paving the way for more realistic and robust DST models.

翻译：大型语言模型（LLMs）在零样本对话状态追踪（DST）中展现出卓越性能，减少了对任务特定训练的需求。然而，传统的DST基准主要关注结构化的用户-智能体会话，未能捕捉现实世界多用户交互的复杂性。在本研究中，我们在最小化数据集构建成本的同时，评估了LLMs在多用户DST中的鲁棒性。受基于LLM的数据标注最新进展的启发，我们基于言语行为理论生成第二位用户的语句，从而扩展了现有的DST数据集。我们的方法系统地将第二位用户的语句融入对话中，实现了对LLMs在多用户场景下受控评估。实验结果显示，与单用户DST相比，性能显著下降，突显了当前LLMs在多位说话者之间提取和追踪对话状态的局限性。我们的发现强调了未来研究需要增强LLMs以应对多用户DST场景，为开发更现实、更鲁棒的DST模型铺平道路。

相关内容

DST (Digital Sky Technologies)

关注 1

DST ( Digital Sky Technologies) 为一家俄罗斯科技、投资公司，创始人为 Yuri Milner。2010 年，DST 将旗下邮件服务和投资职能拆分为 http://Mail.ru Group 和 DST Global 两家公司。 DST 曾投资过 Facebook、Twitter、Groupon、Airbnb、Spotify、Zynga、Flipkart、阿里巴巴、京东等知名科技互联网企业。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日