In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in-painting. We quantitatively and qualitatively show that the edits are able to obtain results which outperform Audio-LDM, a recently released text-prompted audio generation model. Qualitative inspection of the results points out that the edits given by our approach remain more faithful to the input audio in terms of keeping the original onsets and offsets of the audio events.
翻译:本文探讨了利用非刚性文本编辑进行音频编辑的方法。我们证明,所提出的编辑流程能够生成忠实于输入音频的编辑结果。我们研究了执行添加、风格转换和修复功能的文本提示。通过定量与定性分析,我们表明该编辑方法获得的结果优于近期发布的文本提示音频生成模型Audio-LDM。对结果的定性分析指出,我们的方法在保持音频事件原始起始点与终止点方面,能够更好地维持对输入音频的忠实度。