Generative audio models are rapidly advancing in both capabilities and public utilization -- several powerful generative audio models have readily available open weights, and some tech companies have released high quality generative audio products. Yet, while prior work has enumerated many ethical issues stemming from the data on which generative visual and textual models have been trained, we have little understanding of similar issues with generative audio datasets, including those related to bias, toxicity, and intellectual property. To bridge this gap, we conducted a literature review of hundreds of audio datasets and selected seven of the most prominent to audit in more detail. We found that these datasets are biased against women, contain toxic stereotypes about marginalized communities, and contain significant amounts of copyrighted work. To enable artists to see if they are in popular audio datasets and facilitate exploration of the contents of these datasets, we developed a web tool audio datasets exploration tool at https://audio-audit.vercel.app.
翻译:生成式音频模型在能力和公共应用方面正迅速发展——若干强大的生成式音频模型已提供可公开获取的开放权重,部分科技公司也发布了高质量的生成式音频产品。然而,尽管先前研究已列举了生成式视觉与文本模型训练数据引发的诸多伦理问题,我们对生成式音频数据集存在的类似问题(包括与偏见、有害内容及知识产权相关的问题)仍知之甚少。为弥补这一认知差距,我们对数百个音频数据集进行了文献综述,并选取其中七个最具代表性的数据集进行了详细审计。研究发现,这些数据集存在对女性的系统性偏见,包含针对边缘化群体的有害刻板印象,且含有大量受版权保护的作品。为帮助艺术家查询其作品是否被收录于主流音频数据集,并促进对这些数据集内容的探索,我们开发了在线音频数据集探索工具,可通过 https://audio-audit.vercel.app 访问。