A Novel Score-CAM based Denoiser for Spectrographic Signature Extraction without Ground Truth

Sonar based audio classification techniques are a growing area of research in the field of underwater acoustics. Usually, underwater noise picked up by passive sonar transducers contains all types of signals that travel through the ocean and is transformed into spectrographic images. As a result, the corresponding spectrograms intended to display the temporal-frequency data of a certain object often include the tonal regions of abundant extraneous noise that can effectively interfere with a 'contact'. So, a majority of spectrographic samples extracted from underwater audio signals are rendered unusable due to their clutter and lack the required indistinguishability between different objects. With limited clean true data for supervised training, creating classification models for these audio signals is severely bottlenecked. This paper derives several new techniques to combat this problem by developing a novel Score-CAM based denoiser to extract an object's signature from noisy spectrographic data without being given any ground truth data. In particular, this paper proposes a novel generative adversarial network architecture for learning and producing spectrographic training data in similar distributions to low-feature spectrogram inputs. In addition, this paper also a generalizable class activation mapping based denoiser for different distributions of acoustic data, even real-world data distributions. Utilizing these novel architectures and proposed denoising techniques, these experiments demonstrate state-of-the-art noise reduction accuracy and improved classification accuracy than current audio classification standards. As such, this approach has applications not only to audio data but for countless data distributions used all around the world for machine learning.

翻译：基于声纳的音频分类技术是水声学领域日益增长的研究方向。通常，被动声纳换能器采集的水下噪声包含各类在海洋中传播的信号，并被转换为谱图图像。因此，旨在显示特定目标时频数据的对应谱图往往包含大量外部噪声的声调区域，这些噪声会严重干扰“接触信号”的识别。由于杂波干扰以及不同目标间缺乏必要的可区分性，从水下音频信号中提取的大多数谱图样本均无法使用。在可用于监督训练的洁净真实数据有限的情况下，为这些音频信号建立分类模型受到严重制约。本文通过开发一种基于Score-CAM的新型去噪器，提出若干创新技术以应对此问题，该去噪器可在不依赖任何真实标签数据的情况下从含噪谱图数据中提取目标特征。具体而言，本文提出一种新型生成对抗网络架构，用于学习并生成与低特征谱图输入具有相似分布的训练数据。此外，本文还提出一种基于类别激活映射的通用化去噪器，适用于不同分布的声学数据（包括真实世界数据分布）。实验表明，利用这些新颖架构与所提出的去噪技术，在降噪精度方面达到当前最优水平，且分类精度较现有音频分类标准有显著提升。因此，该方法不仅适用于音频数据，还可广泛应用于全球机器学习领域中的无数数据分布场景。