A novel feature, based on the chirp z-transform, that offers an improved representation of the underlying true spectrum is proposed. This feature, the chirp MFCC, is derived by computing the Mel frequency cepstral coefficients from the chirp magnitude spectrum, instead of the Fourier transform magnitude spectrum. The theoretical foundations for the proposal, and the experimental validation using product of likelihood Gaussians, to show the improved class separation offered by the proposed chirp MFCC, when compared with vanilla MFCC are discussed. Further, real world evaluation of the feature is performed using three diverse tasks, namely, speech-music classification, speaker identification, and speech commands recognition. It is shown in all three tasks that the proposed chirp MFCC offers considerable improvements.
翻译:提出了一种基于啁啾z变换的新型特征,该特征能够提供对真实谱的更优表示。该特征称为啁啾MFCC,其通过从啁啾幅度谱(而非傅里叶变换幅度谱)计算梅尔频率倒谱系数得到。本文讨论了该特征提出的理论基础,并利用高斯似然乘积进行实验验证,证明与标准MFCC相比,所提出的啁啾MFCC能够实现更优的类别分离。此外,通过语音-音乐分类、说话人识别和语音命令识别这三项多样化任务对该特征进行了实际评估。结果表明,在所有三项任务中,所提出的啁啾MFCC均带来了显著的性能提升。