Marine mammal communication is a complex field, hindered by the diversity of vocalizations and environmental factors. The Watkins Marine Mammal Sound Database (WMMD) constitutes a comprehensive labeled dataset employed in machine learning applications. Nevertheless, the methodologies for data preparation, preprocessing, and classification documented in the literature exhibit considerable variability and are typically not applied to the dataset in its entirety. This study initially undertakes a concise review of the state-of-the-art benchmarks pertaining to the dataset, with a particular focus on clarifying data preparation and preprocessing techniques. Subsequently, we explore the utilization of the Wavelet Scattering Transform (WST) and Mel spectrogram as preprocessing mechanisms for feature extraction. In this paper, we introduce \textbf{WhaleNet} (Wavelet Highly Adaptive Learning Ensemble Network), a sophisticated deep ensemble architecture for the classification of marine mammal vocalizations, leveraging both WST and Mel spectrogram for enhanced feature discrimination. By integrating the insights derived from WST and Mel representations, we achieved an improvement in classification accuracy by $8-10\%$ over existing architectures, corresponding to a classification accuracy of $97.61\%$.
翻译:海洋哺乳动物通信是一个复杂领域,其研究受到发声多样性和环境因素的限制。沃特金斯海洋哺乳动物声音数据库(WMMD)构成了一个应用于机器学习任务的综合性标注数据集。然而,文献中记载的数据准备、预处理和分类方法存在显著差异,且通常未应用于完整数据集。本研究首先对与该数据集相关的最新基准进行了简要回顾,特别侧重于阐明数据准备和预处理技术。随后,我们探讨了小波散射变换(WST)和梅尔频谱图作为特征提取预处理机制的应用。本文提出**鲸网**(小波高度自适应学习集成网络),这是一种用于海洋哺乳动物发声分类的复杂深度集成架构,它同时利用WST和梅尔频谱图以增强特征区分能力。通过整合从WST和梅尔表示中提取的信息,我们在现有架构基础上将分类准确率提升了$8-10\%$,对应达到$97.61\%$的分类准确率。