The addition of Foley sound effects during post-production is a common technique used to enhance the perceived acoustic properties of multimedia content. Traditionally, Foley sound has been produced by human Foley artists, which involves manual recording and mixing of sound. However, recent advances in sound synthesis and generative models have generated interest in machine-assisted or automatic Foley synthesis techniques. To promote further research in this area, we have organized a challenge in DCASE 2023: Task 7 - Foley Sound Synthesis. Our challenge aims to provide a standardized evaluation framework that is both rigorous and efficient, allowing for the evaluation of different Foley synthesis systems. We received 17 submissions, and performed both objective and subjective evaluation to rank them according to three criteria: audio quality, fit-to-category, and diversity. Through this challenge, we hope to encourage active participation from the research community and advance the state-of-the-art in automatic Foley synthesis. In this technical report, we provide a detailed overview of the Foley sound synthesis challenge, including task definition, dataset, baseline, evaluation scheme and criteria, challenge result, and discussion.
翻译:在后期制作中添加拟音效果是增强多媒体内容感知声学特性的常用技术。传统上,拟音由人类拟音艺术家通过手动录音和混音完成。然而,随着声音合成与生成模型的最新进展,机器辅助或自动拟音合成技术引起了广泛兴趣。为促进该领域研究,我们在DCASE 2023中组织了第7项任务——拟音声音合成挑战赛。本挑战旨在建立一套既严谨又高效的标准化评估框架,用于评估不同拟音合成系统。我们共收到17份参赛作品,并采用客观与主观评估相结合的方式,依据音频质量、类别匹配度和多样性三项标准对系统进行排名。通过本次挑战,我们期望激发研究社区的积极参与,推动自动拟音合成技术的前沿发展。本技术报告提供了拟音声音合成挑战的详细概述,包括任务定义、数据集、基线系统、评估方案与标准、挑战结果及讨论。