Generative audio models typically focus their applications in music and speech generation, with recent models having human-like quality in their audio output. This paper conducts a systematic literature review of 884 papers in the area of generative audio models in order to both quantify the degree to which researchers in the field are considering potential negative impacts and identify the types of ethical implications researchers in this area need to consider. Though 65% of generative audio research papers note positive potential impacts of their work, less than 10% discuss any negative impacts. This jarringly small percentage of papers considering negative impact is particularly worrying because the issues brought to light by the few papers doing so are raising serious ethical implications and concerns relevant to the broader field such as the potential for fraud, deep-fakes, and copyright infringement. By quantifying this lack of ethical consideration in generative audio research and identifying key areas of potential harm, this paper lays the groundwork for future work in the field at a critical point in time in order to guide more conscientious research as this field progresses.
翻译:生成式音频模型主要应用于音乐和语音生成领域,近年来已有模型能够生成类人品质的音频输出。本文通过系统性地综述884篇生成式音频模型相关文献,旨在量化该领域研究者对潜在负面影响的考量程度,并识别研究者需关注的伦理影响类型。尽管65%的生成式音频研究论文提及了其工作的积极潜在影响,但仅不足10%的论文讨论了任何负面影响。这一极低的负面效应讨论比例令人担忧,尤其因为少数触及该议题的论文所揭示的问题正引发与领域相关的重大伦理关切——包括欺诈、深度伪造及版权侵权等潜在风险。通过量化生成式音频研究中伦理考量的缺失,并明确关键危害领域,本文在当前关键时间节点为后续研究奠定了基础,以期引导该领域朝着更具责任意识的方向发展。