ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models

Computational paralinguistics (ComParal) aims to develop algorithms and models to automatically detect, analyze, and interpret non-verbal information from speech communication, e. g., emotion, health state, age, and gender. Despite its rapid progress, it heavily depends on sophisticatedly designed models given specific paralinguistic tasks. Thus, the heterogeneity and diversity of ComParal models largely prevent the realistic implementation of ComParal models. Recently, with the advent of acoustic foundation models because of self-supervised learning, developing more generic models that can efficiently perceive a plethora of paralinguistic information has become an active topic in speech processing. However, it lacks a unified evaluation framework for a fair and consistent performance comparison. To bridge this gap, we conduct a large-scale benchmark, namely ParaLBench, which concentrates on standardizing the evaluation process of diverse paralinguistic tasks, including critical aspects of affective computing such as emotion recognition and emotion dimensions prediction, over different acoustic foundation models. This benchmark contains ten datasets with thirteen distinct paralinguistic tasks, covering short-, medium- and long-term characteristics. Each task is carried out on 14 acoustic foundation models under a unified evaluation framework, which allows for an unbiased methodological comparison and offers a grounded reference for the ComParal community. Based on the insights gained from ParaLBench, we also point out potential research directions, i.e., the cross-corpus generalizability, to propel ComParal research in the future. The code associated with this study will be available to foster the transparency and replicability of this work for succeeding researchers.

翻译：副语言计算旨在开发算法与模型，以自动检测、分析和解释语音交流中的非语言信息，例如情感、健康状况、年龄和性别。尽管该领域发展迅速，但其严重依赖于针对特定副语言任务精心设计的模型。因此，副语言计算模型的异构性和多样性在很大程度上阻碍了其实际应用。近年来，随着自监督学习推动声学基础模型的出现，开发能够高效感知大量副语言信息的通用模型已成为语音处理领域的热点课题。然而，目前缺乏一个统一的评估框架来进行公平且一致的性能比较。为弥补这一空白，我们构建了一个大规模基准，即ParaLBench，其重点在于标准化不同副语言任务的评估流程，涵盖情感计算的关键方面（如情感识别与情感维度预测），并基于多种声学基础模型进行评测。该基准包含十个数据集，涉及十三项不同的副语言任务，覆盖短、中、长期语音特征。每项任务均在统一的评估框架下对14种声学基础模型进行测试，从而实现无偏的方法学比较，并为副语言计算社区提供可靠的参考依据。基于从ParaLBench获得的洞察，我们还指出了潜在的研究方向，即跨语料库的泛化能力，以推动未来副语言计算研究的发展。本研究的关联代码将公开提供，以促进后续研究的透明度和可复现性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日