The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic Foley synthesis techniques. To promote further research in this area, we have organized a challenge in DCASE 2023: Task 7 - Foley Sound Synthesis. Our challenge aims to provide a standardized evaluation framework that is both rigorous and efficient, allowing for the evaluation of different Foley synthesis systems. Through this challenge, we hope to encourage active participation from the research community and advance the state-of-the-art in automatic Foley synthesis. In this technical report, we provide a detailed overview of the Foley sound synthesis challenge, including task definition, dataset, baseline, evaluation scheme and criteria, and discussion.
翻译:在后处理阶段添加拟音效果是增强多媒体内容感知声学特性的常用技术。传统上,拟音由人类拟音师制作,涉及手动录制和混音。然而,近期声音合成与生成模型的进步催生了机器辅助或自动拟音合成技术的研发兴趣。为促进该领域进一步研究,我们在DCASE 2023中组织了第七项任务——拟音合成挑战赛。本挑战旨在建立兼具严谨性与高效性的标准化评估框架,以支持不同拟音合成系统的评测。通过此项挑战,我们期待激励研究社区的积极参与,推动自动拟音合成技术达到最新水平。本技术报告详细阐述了拟音合成挑战赛的完整框架,涵盖任务定义、数据集、基线系统、评估方案与准则,以及相关讨论。