The COVID-19 pandemic has led to an increased use of remote telephonic interviews, making it important to distinguish between scripted and spontaneous speech in audio recordings. In this paper, we propose a novel scheme for identifying read and spontaneous speech. Our approach uses a pre-trained DeepSpeech audio-to-alphabet recognition engine to generate a sequence of alphabets from the audio. From these alphabets, we derive features that allow us to discriminate between read and spontaneous speech. Our experimental results show that even a small set of self-explanatory features can effectively classify the two types of speech very effectively.
翻译:COVID-19疫情导致远程电话访谈的使用增加,使得区分录音中的脚本化语音和自发语音变得重要。本文提出了一种识别朗读与自发语音的新方案。我们的方法采用预训练的DeepSpeech音频到字母识别引擎,从音频中生成字母序列。基于这些字母,我们提取出能区分朗读与自发语音的特征。实验结果表明,即使是一组少量且具有自解释性的特征,也能非常有效地对这两种语音类型进行分类。