Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations

In spoken dialogue, even if two current turns are the same sentence, their responses might still differ when they are spoken in different styles. The spoken styles, containing paralinguistic and prosodic information, mark the most significant difference between text and speech modality. When using text-only LLMs to model spoken dialogue, text-only LLMs cannot give different responses based on the speaking style of the current turn. In this paper, we focus on enabling LLMs to listen to the speaking styles and respond properly. Our goal is to teach the LLM that "even if the sentences are identical if they are spoken in different styles, their corresponding responses might be different". Since there is no suitable dataset for achieving this goal, we collect a speech-to-speech dataset, StyleTalk, with the following desired characteristics: when two current speeches have the same content but are spoken in different styles, their responses will be different. To teach LLMs to understand and respond properly to the speaking styles, we propose the Spoken-LLM framework that can model the linguistic content and the speaking styles. We train Spoken-LLM using the StyleTalk dataset and devise a two-stage training pipeline to help the Spoken-LLM better learn the speaking styles. Based on extensive experiments, we show that Spoken-LLM outperforms text-only baselines and prior speech LLMs methods.

翻译：在口语对话中，即使当前两个话轮是相同的句子，当它们以不同风格说出时，其回应仍可能不同。包含副语言信息和韵律特征的说话风格，是文本与语音模态之间最显著的差异。当使用纯文本大语言模型对口语对话进行建模时，这些模型无法根据当前话轮的说话风格给出不同的回应。本文聚焦于使大语言模型能够感知说话风格并作出恰当回应。我们的目标是教会大语言模型理解"即使句子完全相同，若以不同风格说出，其对应回应也可能不同"。由于缺乏实现该目标的合适数据集，我们收集了一个语音到语音数据集StyleTalk，该数据集具备以下理想特性：当两段当前语音内容相同但说话风格不同时，其回应将有所差异。为教会大语言模型理解说话风格并作出恰当回应，我们提出了能够同时建模语言内容和说话风格的Spoken-LLM框架。我们使用StyleTalk数据集训练Spoken-LLM，并设计了两阶段训练流程以帮助Spoken-LLM更好地学习说话风格。基于大量实验，我们证明Spoken-LLM优于纯文本基线模型及现有语音大语言模型方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日