DELULU：利用潜在单元进行判别性嵌入学习的说话人感知自监督语音基础模型 (DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Supervised Speech Foundational Model)

Self-supervised speech models have achieved remarkable success on content-driven tasks, yet they remain limited in capturing speaker-discriminative features critical for verification, diarization, and profiling applications. We introduce DELULU, a speaker-aware self-supervised foundational model that addresses this limitation by integrating external supervision into the pseudo-label generation process. DELULU leverages frame-level embeddings from ReDimNet, a state-of-the-art speaker verification model, to guide the k-means clustering step during pre-training, introducing a strong speaker-discriminative inductive bias that aligns representation learning with speaker identity. The model is trained using a dual objective that combines masked prediction and denoising, further enhancing robustness and generalization. DELULU significantly outperforms prior self-supervised learning (SSL) models across a range of speaker-centric tasks, achieving up to 62% relative improvement in equal error rate (EER) for speaker verification and consistent gains on zero-shot profiling tasks such as gender, age, accent, and speaker counting. Our findings demonstrate that DELULU is a strong universal encoder for speaker-aware speech processing, enabling superior performance even without task-specific fine-tuning.

翻译：自监督语音模型在内容驱动的任务上取得了显著成功，但在捕获对于验证、日记化和画像应用至关重要的说话人判别性特征方面仍然有限。我们提出了DELULU，一个说话人感知的自监督基础模型，它通过将外部监督集成到伪标签生成过程中来解决这一局限性。DELULU利用来自最先进的说话人验证模型ReDimNet的帧级嵌入，在预训练期间指导k-means聚类步骤，从而引入一个强大的说话人判别性归纳偏置，使表示学习与说话人身份对齐。该模型采用结合掩码预测和去噪的双重目标进行训练，进一步增强了鲁棒性和泛化能力。DELULU在一系列以说话人为中心的任务上显著优于先前的自监督学习（SSL）模型，在说话人验证中实现了高达62%的等错误率（EER）相对提升，并在性别、年龄、口音和说话人计数等零样本画像任务上取得了一致的增益。我们的研究结果表明，DELULU是一个强大的通用编码器，适用于说话人感知的语音处理，即使在没有任务特定微调的情况下也能实现卓越性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日