A persistent challenge in generative audio models is data replication, where the model unintentionally generates parts of its training data during inference. In this work, we address this issue in text-to-audio diffusion models by exploring the use of anti-memorization strategies. We adopt Anti-Memorization Guidance (AMG), a technique that modifies the sampling process of pre-trained diffusion models to discourage memorization. Our study explores three types of guidance within AMG, each designed to reduce replication while preserving generation quality. We use Stable Audio Open as our backbone, leveraging its fully open-source architecture and training dataset. Our comprehensive experimental analysis suggests that AMG significantly mitigates memorization in diffusion-based text-to-audio generation without compromising audio fidelity or semantic alignment.
翻译:生成式音频模型面临的一个持续挑战是数据复制问题,即模型在推理过程中无意地生成其训练数据的片段。本研究针对文本到音频扩散模型,通过探索反记忆化策略的应用来解决这一问题。我们采用反记忆化引导(AMG)技术,该技术通过修改预训练扩散模型的采样过程来抑制记忆化行为。我们的研究探索了AMG中的三种引导类型,每种均旨在减少数据复制的同时保持生成质量。我们以Stable Audio Open作为基础模型,利用其完全开源的架构和训练数据集。全面的实验分析表明,AMG能在不影响音频保真度或语义对齐的前提下,显著缓解基于扩散的文本到音频生成中的记忆化现象。