OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

Empathetic interaction is a cornerstone of human-machine communication, due to the need for understanding speech enriched with paralinguistic cues and generating emotional and expressive responses. However, the most powerful empathetic LSLMs are increasingly closed off, leaving the crucial details about the architecture, data and development opaque to researchers. Given the critical need for transparent research into the LSLMs and empathetic behavior, we present OpenS2S, a fully open-source, transparent and end-to-end LSLM designed to enable empathetic speech interactions. Based on our empathetic speech-to-text model BLSP-Emo, OpenS2S further employs a streaming interleaved decoding architecture to achieve low-latency speech generation. To facilitate end-to-end training, OpenS2S incorporates an automated data construction pipeline that synthesizes diverse, high-quality empathetic speech dialogues at low cost. By leveraging large language models to generate empathetic content and controllable text-to-speech systems to introduce speaker and emotional variation, we construct a scalable training corpus with rich paralinguistic diversity and minimal human supervision. We release the fully open-source OpenS2S model, including the dataset, model weights, pre-training and fine-tuning codes, to empower the broader research community and accelerate innovation in empathetic speech systems. The project webpage can be accessed at https://casia-lm.github.io/OpenS2S

翻译：共情交互是人机通信的基石，其关键在于理解富含副语言线索的语音并生成富有情感和表现力的回应。然而，最强大的共情大语音语言模型正日益封闭，其架构、数据及开发的关键细节对研究人员而言变得不透明。鉴于对大语音语言模型及其共情行为进行透明研究的迫切需要，我们提出了OpenS2S，一个全开源、透明且端到端的大语音语言模型，旨在实现共情的语音交互。基于我们已有的共情语音转文本模型BLSP-Emo，OpenS2S进一步采用流式交错解码架构，以实现低延迟语音生成。为促进端到端训练，OpenS2S集成了一个自动化数据构建流程，能够以低成本合成多样化、高质量的共情语音对话。通过利用大语言模型生成共情内容，并结合可控的文本转语音系统引入说话者与情感变化，我们构建了一个具有丰富副语言多样性且仅需极少人工监督的可扩展训练语料库。我们发布了全开源的OpenS2S模型，包括数据集、模型权重、预训练与微调代码，以赋能更广泛的研究社区，并加速共情语音系统的创新。项目网页可通过 https://casia-lm.github.io/OpenS2S 访问。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/