Large language models (LLMs) have made dialogue one of the central modes of human-machine interaction, leading to the accumulation of vast amounts of conversation logs and increasing demand for dialogue generation. A conversational life-cycle spans from the Prelude through the Interlocution to the Epilogue, encompassing various elements. Despite the existence of numerous dialogue-related studies, there is a lack of benchmarks that encompass comprehensive dialogue elements, hindering precise modeling and systematic evaluation. To bridge this gap, we introduce an innovative research task $\textbf{D}$ialogue $\textbf{E}$lement $\textbf{MO}$deling, including $\textit{Element Awareness}$ and $\textit{Dialogue Agent Interaction}$, and propose a novel benchmark, $\textbf{DEMO}$, designed for a comprehensive dialogue modeling and assessment. Inspired by imitation learning, we further build the agent which possesses the adept ability to model dialogue elements based on the DEMO benchmark. Extensive experiments indicate that existing LLMs still exhibit considerable potential for enhancement, and our DEMO agent has superior performance in both in-domain and out-of-domain tasks.
翻译:大型语言模型(LLM)已使对话成为人机交互的核心模式之一,导致海量对话日志的积累及对话生成需求的日益增长。完整的对话生命周期涵盖从序幕、对白到尾声的多个阶段,包含多种对话元素。尽管已有大量对话相关研究,但仍缺乏覆盖全面对话元素的基准数据集,这阻碍了精确建模与系统化评估。为填补这一空白,我们提出创新的研究任务——对话元素建模(DEMO),包括元素感知与对话智能体交互,并构建了面向综合对话建模与评估的新型基准数据集DEMO。受模仿学习启发,我们进一步基于DEMO基准构建了具备对话元素建模能力的智能体。大量实验表明,现有LLM仍存在显著的提升空间,而我们的DEMO智能体在领域内与跨领域任务中均表现出优越性能。