We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.
翻译:我们提出了一种端到端的音乐混音风格迁移系统,能够将输入多轨音频的混音风格转换为参考歌曲的风格。该系统通过一个采用对比学习目标进行预训练的编码器实现,该编码器仅从参考音乐录音中提取与音频效果相关的信息。所有模型均以自监督方式训练,利用已处理的湿多轨数据集,并采用有效的数据预处理方法缓解了未处理干数据稀缺的问题。我们分析了所提编码器在音频效果解耦方面的能力,并通过客观与主观评估验证了其在混音风格迁移中的性能。结果表明,该提出的系统不仅能够将多轨音频的混音风格接近参考音频,而且在结合音乐源分离模型后,对混合轨道的风格迁移也具有鲁棒性。