Audio context determines which sound components and sources are relevant and which can be perceived as irrelevant (noise) by listeners. For example, traffic noise is informative in urban surveillance but noise for a phone call at the same location. Most current audio denoising systems apply fixed target-noise definitions, often removing useful components in one context while failing to suppress irrelevant components. To address this, we introduce the concept automatic contextual audio denoising (ACAD) which defines target and noise based on the inferred context. In this work, we restrict context to be associated with an acoustic scene class. We label sound events outside the event distribution of a scene class (noise) as out-of-context (OC) and events typical for that scene as in-context (IC). We implement a deep learning method that automatically infers the context of the audio signal and removes OC components, and benchmark it against variants: without context inference, with oracle context, and with separately provided uninformative context. On paired clean/noisy data across diverse contexts, where OC components in one context may be IC in another, our proposed method outperforms other approaches across standard objective metrics, indicating that the model can infer context and context-dependent processing can enhance denoising.
翻译:音频上下文决定了哪些声音成分和声源是相关的、哪些可以被听者视为无关(噪声)。例如,交通噪声在城市监控中具有信息性,但在同一地点的通话场景中却是噪声。当前大多数音频降噪系统采用固定的目标-噪声定义,常常在一种上下文中移除有用成分,而在另一种上下文中却未能抑制无关成分。为解决这一问题,我们引入了自动上下文音频降噪(ACAD)的概念,该概念根据推断出的上下文定义目标和噪声。在本工作中,我们将上下文限定为与声学场景类别相关联。我们将声学场景类别事件分布之外的声音事件(噪声)标记为上下文外(OC)事件,而将该场景典型的事件标记为上下文内(IC)事件。我们实现了一种深度学习方法,能够自动推断音频信号的上下文并移除OC成分,并对其与多个变体进行了基准测试:无上下文推断、有理想上下文(oracle context)以及单独提供非信息性上下文的情况。在跨不同上下文的配对干净/噪声数据上,其中一种上下文中的OC成分在另一种上下文中可能成为IC成分,我们的方法在标准客观指标上优于其他方法,这表明模型能够推断上下文,且上下文相关的处理可以增强降噪效果。