Music demixing is the task of separating different tracks from the given single audio signal into components, such as drums, bass, and vocals from the rest of the accompaniment. Separation of sources is useful for a range of areas, including entertainment and hearing aids. In this paper, we introduce two new benchmarks for the sound source separation tasks and compare popular models for sound demixing, as well as their ensembles, on these benchmarks. For the models' assessments, we provide the leaderboard at https://mvsep.com/quality_checker/, giving a comparison for a range of models. The new benchmark datasets are available for download. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem. The proposed solution was evaluated in the context of the Music Demixing Challenge 2023 and achieved top results in different tracks of the challenge. The code and the approach are open-sourced on GitHub.
翻译:音乐分离是将单个音频信号中的不同音轨分离为组件(如鼓、贝斯、人声及其余伴奏)的任务。音源分离在娱乐、助听器等多个领域具有实用价值。本文针对音源分离任务引入两个新的基准测试,并在此基准上对主流声音分离模型及其集成模型进行对比评估。模型评价结果可在排行榜页面https://mvsep.com/quality_checker/获取,该页面提供多模型对比数据。新的基准数据集开放下载。我们同时提出一种基于多模型集成的新型音频分离方法,该方法针对特定音轨选择最优模型进行组合。所提方案在2023年音乐分离挑战赛的不同赛道中获得顶尖成绩,相关代码与算法已在GitHub开源。