基于时域梅尔频率小波系数的音频信号处理 (Audio Signal Processing Using Time Domain Mel-Frequency Wavelet Coefficient)

Extracting features from the speech is the most critical process in speech signal processing. Mel Frequency Cepstral Coefficients (MFCC) are the most widely used features in the majority of the speaker and speech recognition applications, as the filtering in this feature is similar to the filtering taking place in the human ear. But the main drawback of this feature is that it provides only the frequency information of the signal but does not provide the information about at what time which frequency is present. The wavelet transform, with its flexible time-frequency window, provides time and frequency information of the signal and is an appropriate tool for the analysis of non-stationary signals like speech. On the other hand, because of its uniform frequency scaling, a typical wavelet transform may be less effective in analysing speech signals, have poorer frequency resolution in low frequencies, and be less in line with human auditory perception. Hence, it is necessary to develop a feature that incorporates the merits of both MFCC and wavelet transform. A great deal of studies are trying to combine both these features. The present Wavelet Transform based Mel-scaled feature extraction methods require more computation when a wavelet transform is applied on top of Mel-scale filtering, since it adds extra processing steps. Here we are proposing a method to extract Mel scale features in time domain combining the concept of wavelet transform, thus reducing the computational burden of time-frequency conversion and the complexity of wavelet extraction. Combining our proposed Time domain Mel frequency Wavelet Coefficient(TMFWC) technique with the reservoir computing methodology has significantly improved the efficiency of audio signal processing.

翻译：从语音中提取特征是语音信号处理中最关键的过程。梅尔频率倒谱系数（MFCC）是大多数说话人和语音识别应用中最广泛使用的特征，因为该特征中的滤波过程类似于人耳中的滤波机制。然而，该特征的主要缺点在于它仅提供信号的频率信息，而无法提供特定频率在何时出现的时间信息。小波变换凭借其灵活的时频窗口，能够同时提供信号的时间和频率信息，是分析如语音这类非平稳信号的合适工具。另一方面，由于传统小波变换采用均匀频率缩放，其在分析语音信号时可能效率较低，低频段频率分辨率较差，且与人耳听觉感知的匹配度不足。因此，有必要开发一种结合MFCC和小波变换优点的特征。大量研究正尝试融合这两种特征。当前基于小波变换的梅尔尺度特征提取方法在梅尔尺度滤波基础上应用小波变换时，因增加了额外处理步骤而需要更多计算量。本文提出一种在时域中结合小波变换概念提取梅尔尺度特征的方法，从而降低时频转换的计算负担和小波提取的复杂度。将我们提出的时域梅尔频率小波系数（TMFWC）技术与储备池计算方法相结合，显著提升了音频信号处理的效率。