Supervised learning is a mainstream approach to audio signal enhancement (SE) and requires parallel training data consisting of both noisy signals and the corresponding clean signals. Such data can only be synthesised and are mismatched with real data, which can result in poor performance on real data. Moreover, clean signals may be inaccessible in certain scenarios, which renders this conventional approach infeasible. Here we explore SE using non-parallel training data consisting of noisy signals and noise, which can be easily recorded. We define the positive (P) and the negative (N) classes as signal inactivity and activity, respectively. We observe that the spectrogram patches of noise clips can be used as P data and those of noisy signal clips as unlabelled data. Thus, learning from positive and unlabelled data enables a convolutional neural network to learn to classify each spectrogram patch as P or N to enable SE.
翻译:监督学习是音频信号增强(SE)的主流方法,需要包含带噪信号及其对应干净信号的并行训练数据。此类数据仅能通过合成获取,且与真实数据存在失配,可能导致在真实数据上性能不佳。此外,在某些场景下干净信号可能无法获取,这使得传统方法不可行。本文探索使用易于记录的带噪信号与噪声组成的非并行训练数据进行信号增强。我们将正类(P)与负类(N)分别定义为信号静默与信号活跃。我们观察到,噪声片段的频谱图块可用作P数据,而带噪信号片段的频谱图块可用作无标签数据。因此,通过正样本与无标签数据学习,卷积神经网络可学会将每个频谱图块分类为P或N,从而实现信号增强。