Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification

Underwater acoustic classification has a wide array of oceanic applications, but faces challenges due to an increasingly complex acoustic environment. Waveform and spectrogram representations have been primarily used as acoustic data features for classification tasks in this domain. Spectrograms model harmonic dependencies, but these reduced representations can filter out acoustic features relevant for discrimination. While phase information from the waveform allows full characterization of the signal, the original waveform can be noisy and complex, rendering this representation difficult for models to process directly. This paper proposes a dual-encoder neural architecture to simultaneously process acoustic waveforms and spectrograms, leveraging pre-trained backbones and parameter-efficient fine-tuning modules, enabling a domain adaptation. To combine these adapted branches, a novel differentiable fuzzy aggregation mechanism based on the Choquet integral is introduced to balance the temporal and spectral representations. This fusion strategy not only yields higher classification accuracy but also provides interpretability. Specifically, by analyzing the learned fuzzy measures, insights are revealed about class-specific shifts in the network's representation reliance. By dynamically shifting attention to the representation least corrupted by potential asymmetric channel distortions, the proposed gating mechanism mitigates the non-stationary challenges of the underwater environment. Evaluations on the DeepShip and ShipsEar datasets demonstrate that the proposed architecture achieves classification improvements over independent single-encoder baselines, while simultaneously restricting the trainable parameter space. This mitigates the risk of overfitting on limited acoustic datasets while alleviating the computational costs associated with fully fine-tuning foundation models.

翻译：水下声学分类在海洋应用中具有广泛前景，但日益复杂的声学环境使其面临挑战。波形和频谱图表示一直是该领域分类任务中声学数据特征的主要使用形式。频谱图可建模谐波依赖关系，但此类降维表示可能过滤掉与判别相关的声学特征。虽然波形中的相位信息能完整表征信号，但原始波形可能包含噪声且结构复杂，导致模型难以直接处理。本文提出一种双编码器神经架构，可同时处理声学波形与频谱图，通过预训练骨干网络和参数高效微调模块实现领域自适应。为融合两个适配分支，我们引入基于乔奎特积分的可微模糊聚合机制，以平衡时序与频谱表征。该融合策略不仅提升分类精度，还具备可解释性：通过分析学习到的模糊测度，能够揭示网络表征依赖性的类别特异性偏移。所提出的门控机制通过动态将注意力转向受潜在非对称信道失真影响最小的表征，缓解水下环境的非平稳挑战。在DeepShip和ShipsEar数据集上的评估表明，该架构相较于独立单编码器基线实现分类性能提升，同时限制可训练参数空间，这既降低了有限声学数据集上的过拟合风险，又减轻了完全微调基础模型的计算开销。