For many machine learning applications, a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform (STFT) which gives complex values. The spectrogram uses the magnitude of these complex values, a commonly used detector. Modern machine learning systems are commonly overparameterized, where possible ill-conditioning problems are ameliorated by regularization. The common use of rectified linear unit (ReLU) activation functions between layers of a deep net has been shown to help this regularization, improving system performance. We extend this idea of ReLU activation to detection for the complex STFT, providing a simple-to-compute modified and regularized spectrogram, which potentially results in better behaved training. We then confirmed the benefit of this approach on a noisy acoustic data set used for a real-world application. Generalization performance improved substantially. This approach might benefit other applications which use time-frequency mappings, for acoustic, audio, and other applications.
翻译:对于许多机器学习应用而言,一种常见的输入表示形式是频谱图。其底层表示基于短时傅里叶变换(STFT),该变换产生复数数值。频谱图利用这些复数数值的幅度,这是一种常用的检测方法。现代机器学习系统通常存在过度参数化问题,而正则化技术可缓解可能出现的病态问题。研究表明,深度网络层间广泛使用的线性整流单元(ReLU)激活函数有助于这种正则化,从而提升系统性能。我们将ReLU激活的思想扩展到复数STFT的检测中,提出一种易于计算的修正正则化频谱图,该方法可能带来更优的训练行为。随后,我们在用于实际应用的噪声声学数据集上验证了该方法的优势,其泛化性能得到显著提升。该方法或可推广至其他依赖时频映射的应用场景,包括声学、音频及其他领域。