Marine mammal communication is a complex field, hindered by the diversity of vocalizations and environmental factors. The Watkins Marine Mammal Sound Database (WMMD) is an extensive labeled dataset used in machine learning applications. However, the methods for data preparation, preprocessing, and classification found in the literature are quite disparate. This study first focuses on a brief review of the state-of-the-art benchmarks on the dataset, with an emphasis on clarifying data preparation and preprocessing methods. Subsequently, we propose the application of the Wavelet Scattering Transform (WST) in place of standard methods based on the Short-Time Fourier Transform (STFT). The study also tackles a classification task using an ad-hoc deep architecture with residual layers. We outperform the existing classification architecture by $6\%$ in accuracy using WST and $8\%$ using Mel spectrogram preprocessing, effectively reducing by half the number of misclassified samples, and reaching a top accuracy of $96\%$.
翻译:海洋哺乳动物通信是一个复杂领域,其研究受限于发声多样性及环境因素。沃特金斯海洋哺乳动物声音数据库(WMMD)是一个广泛应用于机器学习领域的大规模标注数据集。然而,现有文献中的数据准备、预处理和分类方法存在较大差异。本研究首先对数据集上的前沿基准方法进行简要综述,重点阐明数据准备与预处理方法。随后,我们提出用小波散射变换(WST)替代基于短时傅里叶变换(STFT)的标准方法。本研究还采用包含残差层的特定深度架构处理分类任务。与现有分类架构相比,使用WST和梅尔频谱预处理后,分类准确率分别提升6%和8%,错误分类样本数有效减半,最终达到96%的最高准确率。