Sound effects (SFX) datasets and libraries often employ distinct tagging schemes, taxonomies, and metadata structures. This creates challenges for research on SFX classification and generation because incompatible taxonomies lead to siloed datasets that might require individualized approaches, result in non-comparable outcomes, and prevent data merging strategies. We propose a modular dataset relabeling framework that adopts the Universal Category System (UCS), an industry-standard hierarchical taxonomy for sound effects, as a shared structural foundation. This open-source framework enables us (i) to convert tags of existing datasets to UCS with a rule-based multi-stage pipeline and conflict resolution to achieve high automatic conversion rates, (ii) to suggest a stratified dataset split for the new labels, and (iii) to combine multiple datasets. To showcase the practical utility, we introduce the EnvSound-UCS dataset, a publicly available unified UCS-compliant dataset of environmental sounds with 58,057 sound clips from three sources: AudioSet, FSD50K, and ESC-50.
翻译:音效(SFX)数据集与库通常采用各异的标记方案、分类体系及元数据结构。这给音效分类与生成研究带来了挑战,因为不兼容的分类体系导致数据集孤立,可能需采用个性化处理方法,造成结果不可比,且阻碍数据合并策略的实施。我们提出一种模块化数据集重标框架,采用行业标准的音效分层分类体系——通用类别系统(UCS)作为共享结构基础。该开源框架使我们能够:(i)通过基于规则的多阶段流水线与冲突消解机制,将现有数据集的标签转换为UCS标签,实现高自动转换率;(ii)为新标签提出分层数据集划分方案;(iii)合并多个数据集。为展示其实用价值,我们推出了EnvSound-UCS数据集——一个公开可用的符合UCS标准的环境音统一数据集,包含来自AudioSet、FSD50K和ESC-50三个来源的58,057个音频片段。