Acoustic recognition has emerged as a prominent task in deep learning research, frequently utilizing spectral feature extraction techniques such as the spectrogram from the Short-Time Fourier Transform and the scalogram from the Wavelet Transform. However, there is a notable deficiency in studies that comprehensively discuss the advantages, drawbacks, and performance comparisons of these methods. This paper aims to evaluate the characteristics of these two transforms as input data for acoustic recognition using Convolutional Neural Networks. The performance of the trained models employing both transforms is documented for comparison. Through this analysis, the paper elucidates the advantages and limitations of each method, provides insights into their respective application scenarios, and identifies potential directions for further research.
翻译:声学识别已成为深度学习研究中的一项重要任务,常采用短时傅里叶变换生成的声谱图与小波变换生成的小波尺度图等频谱特征提取技术。然而,目前缺乏全面探讨这些方法优缺点及性能对比的研究。本文旨在评估这两种变换作为声学识别任务输入数据时,在卷积神经网络(CNN)中的特性。研究记录了采用两种变换训练的模型性能以进行对比分析。通过此项分析,本文阐明了各自方法的优势与局限,为其适用场景提供了见解,并指出了未来可能的研究方向。