We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.
翻译:我们发布了EARS(富有表现力的无混响语音录制)数据集,这是一个包含107位不同背景说话者的高质量语音数据集,总计100小时纯净、无混响的语音数据。该数据集涵盖了广泛的不同说话风格,包括情感语音、不同朗读风格、非语言声音以及会话式自由语音。我们在该数据集上对多种语音增强和去混响方法进行了基准测试,并通过一组客观指标评估了它们的性能。此外,我们针对语音增强任务进行了包含20名参与者的听力测试,其中生成式方法更受青睐。我们引入了一个盲测集,允许对上传数据进行自动在线评估。数据集下载链接和自动评估服务器可在网上获取。