Existing research on audio classification faces challenges in recognizing attributes of passive underwater vessel scenarios and lacks well-annotated datasets due to data privacy concerns. In this study, we introduce CLAPP (Contrastive Language-Audio Pre-training in Passive Underwater Vessel Classification), a novel model. Our aim is to train a neural network using a wide range of vessel audio and vessel state text pairs obtained from an oceanship dataset. CLAPP is capable of directly learning from raw vessel audio data and, when available, from carefully curated labels, enabling improved recognition of vessel attributes in passive underwater vessel scenarios. Model's zero-shot capability allows predicting the most relevant vessel state description for a given vessel audio, without directly optimizing for the task. Our approach aims to solve 2 challenges: vessel audio-text classification and passive underwater vessel audio attribute recognition. The proposed method achieves new state-of-the-art results on both Deepship and Shipsear public datasets, with a notable margin of about 7%-13% for accuracy compared to prior methods on zero-shot task.
翻译:现有音频分类研究在识别被动水下舰船场景属性方面面临挑战,且因数据隐私问题缺乏充分标注的数据集。本研究提出CLAPP(被动水下舰船分类中的对比语言-音频预训练)这一新型模型。我们的目标是通过海洋船舶数据集中采集的多样化舰船音频与船舶状态文本配对数据来训练神经网络。CLAPP能够直接从原始舰船音频数据中进行学习,并在具备精细标注标签时加以利用,从而提升被动水下舰船场景中舰船属性的识别能力。模型的零样本能力使其能够在不直接针对任务进行优化的情况下,为给定舰船音频预测最为相关的状态描述。本方法旨在解决两大挑战:舰船音频-文本分类与被动水下舰船音频属性识别。所提方法在Deepship和Shipsear两个公开数据集上均取得了新的最优结果,在零样本任务上的准确率较先前方法提升约7%-13%。