This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most successful approaches employed by participants. Compared to the cocktail-fork baseline, the best-performing system trained exclusively on the simulated Divide and Remaster (DnR) dataset achieved an improvement of 1.8 dB in SDR, whereas the top-performing system on the open leaderboard, where any data could be used for training, saw a significant improvement of 5.7 dB. A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail.
翻译:本文总结了2023年声音分离挑战赛(SDX'23)中的电影音频分离(CDX)赛道。我们全面概述了挑战赛的设置,详细介绍了比赛结构及所使用的数据集。特别地,详细阐述了CDXDB23——一个由真实电影构建的全新隐藏数据集,用于对提交的参赛方案进行排名。本文还深入分析了参赛者所采用的最成功方法。与鸡尾酒叉基线相比,仅使用模拟的Divide and Remaster(DnR)数据集训练的最佳系统在信噪比(SDR)上实现了1.8分贝的提升;而公开排行榜上(允许使用任何数据进行训练)表现最佳的方案更是取得了5.7分贝的显著提升。这一改进的重要来源在于使模拟数据更好地匹配真实电影音频,我们对此进行了详细探究。