This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most successful approaches employed by participants. Compared to the cocktail-fork baseline, the best-performing system trained exclusively on the simulated Divide and Remaster (DnR) dataset achieved an improvement of 1.8 dB in SDR, whereas the top-performing system on the open leaderboard, where any data could be used for training, saw a significant improvement of 5.7 dB. A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail.
翻译:本文总结了2023年声音分离挑战赛(SDX'23)中的电影音轨分离赛道(CDX)。我们全面概述了挑战赛的设置,详细介绍了比赛结构及所使用的数据集。特别地,我们阐述了CDXDB23这一基于真实电影构建的全新隐藏数据集,该数据集用于对参赛作品进行排名。本文还剖析了参赛者所采用的最成功方法。与cocktail-fork基线相比,仅使用模拟Divide and Remaster(DnR)数据集训练的最佳系统在信号失真比(SDR)上实现了1.8 dB的提升;而在允许使用任意数据进行训练的开源排行榜上,最优系统的性能提升显著达到5.7 dB。这一提升的重要来源在于使模拟数据更贴近真实电影音频,我们对此进行了深入探究。