This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.
翻译:本文介绍了 Spheres 数据集,这是一组多轨管弦乐录音,旨在推动古典音乐领域内音乐源分离及相关音乐信息检索(MIR)任务的机器学习研究。该数据集包含由 Colibrì 合奏团在 Spheres 录音室演奏的、时长超过一小时的音乐作品录音,涵盖了柴可夫斯基的《罗密欧与朱丽叶》和莫扎特的《第40交响曲》两部经典作品,以及每件乐器的半音音阶和独奏片段。录音配置采用了23支麦克风,包括点声源、主麦克风和环境麦克风,从而能够生成具有可控串扰的现实立体声混音,并为源分离模型的监督训练提供独立音轨。此外,针对每个乐器位置估计了房间冲激响应,为录音空间的声学特性提供了有价值的描述。我们呈现了数据集结构、声学分析,以及基于 X-UMX 模型在管弦乐家族分离和麦克风串扰去除方面的基线评估。结果突显了在复杂管弦乐场景中源分离的潜力与挑战,强调了该数据集在基准测试以及探索古典音乐分离、定位、去混响和沉浸式渲染新方法方面的价值。