We investigate applying audio manipulations using pretrained neural network-based autoencoders as an alternative to traditional signal processing methods, since the former may provide greater semantic or perceptual organization. To establish the potential of this approach, we first establish if representations from these models encode information about manipulations. We carry out experiments and produce visualizations using representations from two different pretrained autoencoders. Our findings indicate that, while some information about audio manipulations is encoded, this information is both limited and encoded in a non-trivial way. This is supported by our attempts to visualize these representations, which demonstrated that trajectories of representations for common manipulations are typically nonlinear and content dependent, even for linear signal manipulations. As a result, it is not yet clear how these pretrained autoencoders can be used to manipulate audio signals, however, our results indicate this may be due to the lack of disentanglement with respect to common audio manipulations.
翻译:我们探究使用基于预训练神经网络的自动编码器作为传统信号处理方法的替代方案来执行音频操作,因为前者可能提供更强的语义或感知组织能力。为确立该方法的应用潜力,我们首先验证这些模型的表征是否编码了与操作相关的信息。我们使用两种不同预训练自动编码器的表征开展实验并生成可视化结果。研究结果表明,尽管音频操作的部分信息被编码,但该信息既有限又以非简单方式编码。这一发现得到表征可视化尝试的佐证——可视化显示常见操作的表征轨迹通常是非线性且内容相关的,即使对线性信号操作而言亦如此。因此,目前尚不明确如何利用这些预训练自动编码器操作音频信号,但我们的结果暗示,这可能是由于其对常见音频操作缺乏解耦性所致。