In this paper, SER_AMPEL, a multi-source dataset for speech emotion recognition (SER) is presented. The peculiarity of the dataset is that it is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults. The dataset is collected following different protocols, in particular considering acted conversations, extracted from movies and TV series, and recording natural conversations where the emotions are elicited by proper questions. The evidence of the need for such a dataset emerges from the analysis of the state of the art. Preliminary considerations on the critical issues of SER are reported analyzing the classification results on a subset of the proposed dataset.
翻译:本文提出了SER_AMPEL,一个用于语音情感识别(SER)的多源数据集。该数据集的独特之处在于,其收集目的是为意大利老年人的语音情感识别提供参考基准。数据集采用不同协议采集,特别考虑了从电影和电视剧中提取的表演性对话,以及通过适当问题诱发情感的日常对话记录。对该数据集需求的论证源于现有技术的分析。通过对所提出数据集子集的分类结果进行分析,报告了关于SER关键问题的初步考量。