基于时域梅尔频率小波系数的音频信号处理 (Audio Signal Processing Using Time Domain Mel-Frequency Wavelet Coefficient)

Extracting features from the speech is the most critical process in speech signal processing. Mel Frequency Cepstral Coefficients (MFCC) are the most widely used features in the majority of the speaker and speech recognition applications, as the filtering in this feature is similar to the filtering taking place in the human ear. But the main drawback of this feature is that it provides only the frequency information of the signal but does not provide the information about at what time which frequency is present. The wavelet transform, with its flexible time-frequency window, provides time and frequency information of the signal and is an appropriate tool for the analysis of non-stationary signals like speech. On the other hand, because of its uniform frequency scaling, a typical wavelet transform may be less effective in analysing speech signals, have poorer frequency resolution in low frequencies, and be less in line with human auditory perception. Hence, it is necessary to develop a feature that incorporates the merits of both MFCC and wavelet transform. A great deal of studies are trying to combine both these features. The present Wavelet Transform based Mel-scaled feature extraction methods require more computation when a wavelet transform is applied on top of Mel-scale filtering, since it adds extra processing steps. Here we are proposing a method to extract Mel scale features in time domain combining the concept of wavelet transform, thus reducing the computational burden of time-frequency conversion and the complexity of wavelet extraction. Combining our proposed Time domain Mel frequency Wavelet Coefficient(TMFWC) technique with the reservoir computing methodology has significantly improved the efficiency of audio signal processing.

翻译：从语音中提取特征是语音信号处理中最关键的环节。梅尔频率倒谱系数（MFCC）因其滤波过程与人耳听觉机制相似，已成为大多数说话人识别与语音识别应用中最广泛使用的特征。然而，该特征的主要缺陷在于仅提供信号的频率信息，无法反映特定频率成分出现的时间点。小波变换凭借其灵活的时频窗口，能够同时提供信号的时域与频域信息，是分析语音等非平稳信号的理想工具。但传统小波变换采用均匀频率缩放，在分析语音信号时可能存在效率不足、低频段频率分辨率较低、与人耳听觉感知匹配度不高等问题。因此，有必要开发一种融合MFCC与小波变换优势的特征提取方法。大量研究正致力于整合这两种特征。现有基于小波变换的梅尔尺度特征提取方法通常在梅尔尺度滤波基础上叠加小波变换，增加了额外处理步骤，导致计算量上升。本文提出一种在时域中结合小波变换概念提取梅尔尺度特征的方法，从而降低时频转换的计算负担与小波提取的复杂度。将我们提出的时域梅尔频率小波系数（TMFWC）技术与储备池计算相结合，显著提升了音频信号处理的效率。