Music Source Restoration (MSR) extends source separation to realistic settings where signals undergo production effects (equalization, compression, reverb) and real-world degradations, with the goal of recovering the original unprocessed sources. Existing benchmarks cannot measure restoration fidelity: synthetic datasets use unprocessed stems but unrealistic mixtures, while real production datasets provide only already-processed stems without clean references. We present MSRBench, the first benchmark explicitly designed for MSR evaluation. MSRBench contains raw stem-mixture pairs across eight instrument classes, where mixtures are produced by professional mixing engineers. These raw-processed pairs enable direct evaluation of both separation accuracy and restoration fidelity. Beyond controlled studio conditions, the mixtures are augmented with twelve real-world degradations spanning analog artifacts, acoustic environments, and lossy codecs. Baseline experiments with U-Net and BSRNN achieve SI-SNR of -37.8 dB and -23.4 dB respectively, with perceptual quality (FAD CLAP) around 0.7-0.8, demonstrating substantial room for improvement and the need for restoration-specific architectures.
翻译:音乐源修复(MSR)将源分离任务扩展到现实场景,其中信号经历了制作效果(均衡、压缩、混响)和真实世界劣化,其目标是恢复原始未处理的源信号。现有基准无法衡量修复保真度:合成数据集使用未处理的音轨但混合方式不真实,而真实制作数据集仅提供已处理的音轨且缺乏干净参考。我们提出了MSRBench,这是首个专为MSR评估明确设计的基准。MSRBench包含涵盖八种乐器类别的原始音轨-混合对,其中混合由专业混音工程师制作。这些原始-处理配对能够直接评估分离准确性和修复保真度。除了受控的录音室条件外,这些混合还通过十二种真实世界劣化进行了增强,涵盖模拟伪影、声学环境和有损编解码器。使用U-Net和BSRNN的基线实验分别实现了-37.8 dB和-23.4 dB的SI-SNR,感知质量(FAD CLAP)约为0.7-0.8,这表明存在巨大的改进空间,并凸显了对修复专用架构的需求。