Underwater acoustic target recognition (UATR) plays a vital role in marine applications but remains challenging due to limited labeled data and the complexity of ocean environments. This paper explores a central question: can speech large models (SLMs), trained on massive human speech corpora, be effectively transferred to underwater acoustics? To investigate this, we propose UATR-SLM, a simple framework that reuses the speech feature pipeline, adapts the SLM as an acoustic encoder, and adds a lightweight classifier.Experiments on the DeepShip and ShipsEar benchmarks show that UATR-SLM achieves over 99% in-domain accuracy, maintains strong robustness across variable signal lengths, and reaches up to 96.67% accuracy in cross-domain evaluation. These results highlight the strong transferability of SLMs to UATR, establishing a promising paradigm for leveraging speech foundation models in underwater acoustics.
翻译:水下声学目标识别(UATR)在海洋应用中起着至关重要的作用,但由于标记数据有限和海洋环境的复杂性,该任务仍具挑战性。本文探讨一个核心问题:在大量人类语音语料库上训练的语音大模型(SLMs)能否有效地迁移到水下声学领域?为此,我们提出了UATR-SLM,一个简单的框架,该框架复用语音特征处理流程,将SLM适配为声学编码器,并添加一个轻量级分类器。在DeepShip和ShipsEar基准测试上的实验表明,UATR-SLM实现了超过99%的域内准确率,在不同信号长度下保持了强大的鲁棒性,并在跨域评估中达到了高达96.67%的准确率。这些结果凸显了SLMs向UATR的强大可迁移性,为在水下声学中利用语音基础模型建立了一个有前景的范式。