Large Language Models (LLMs) demonstrate superior performance in generative scenarios and have attracted widespread attention. Among them, stylized dialogue generation is essential in the context of LLMs for building intelligent and engaging dialogue agent. However the ability of LLMs is data-driven and limited by data bias, leading to poor performance on specific tasks. In particular, stylized dialogue generation suffers from a severe lack of supervised data. Furthermore, although many prompt-based methods have been proposed to accomplish specific tasks, their performance in complex real-world scenarios involving a wide variety of dialog styles further enhancement. In this work, we first introduce a stylized dialogue dataset StyleEval with 38 styles by leveraging the generative power of LLMs comprehensively, which has been carefully constructed with rigorous human-led quality control. Based on this, we propose the stylized dialogue framework StyleChat via recitation-augmented memory strategy and multi-task style learning strategy to promote generalization ability. To evaluate the effectiveness of our approach, we created a test benchmark that included both a generation task and a choice task to comprehensively evaluate trained models and assess whether styles and preferences are remembered and understood. Experimental results show that our proposed framework StyleChat outperforms all the baselines and helps to break the style boundary of LLMs.
翻译:摘要:大型语言模型(LLMs)在生成任务中展现出卓越性能,并引起了广泛关注。其中,风格化对话生成在构建智能且引人入胜的对话代理方面至关重要,但其在LLMs中的应用仍面临挑战。由于LLMs的能力具有数据驱动特性并受限于数据偏差,导致其在特定任务上表现欠佳。尤其风格化对话生成面临严重的有监督数据匮乏问题。此外,尽管已有多种基于提示的方法被提出以完成特定任务,但在涉及多种对话风格的复杂现实场景中,这些方法的表现仍有待进一步提升。本文首先通过充分利用LLMs的生成能力,构建了包含38种风格的风格化对话数据集StyleEval,该数据集经过严格的人工主导质量控制精心构建。在此基础上,我们提出基于背诵增强记忆策略与多任务风格学习策略的风格化对话框架StyleChat,以提升模型泛化能力。为评估方法的有效性,我们创建了包含生成任务与选择任务的测试基准,全面评估训练模型是否能够记忆并理解风格与偏好。实验结果表明,所提框架StyleChat优于所有基线方法,有助于打破LLMs的风格边界。