We present SoundMorpher, a sound morphing method that generates perceptually uniform morphing trajectories using a diffusion model. Traditional sound morphing methods models the intractable relationship between morph factor and perception of the stimuli for resulting sounds under a linear assumption, which oversimplifies the complex nature of sound perception and limits their morph quality. In contrast, SoundMorpher explores an explicit proportional mapping between the morph factor and the perceptual stimuli of morphed sounds based on Mel-spectrogram. This approach enables smoother transitions between intermediate sounds and ensures perceptually consistent transformations, which can be easily extended to diverse sound morphing tasks. Furthermore, we present a set of quantitative metrics to comprehensively assess sound morphing systems based on three objective criteria, namely, correspondence, perceptual intermediateness, and smoothness. We provide extensive experiments to demonstrate the effectiveness and versatility of SoundMorpher in real-world scenarios, highlighting its potential impact on various applications such as creative music composition, film post-production and interactive audio technologies.
翻译:本文提出SoundMorpher,一种利用扩散模型生成感知均匀变形轨迹的声音变形方法。传统声音变形方法在线性假设下建模变形因子与生成声音感知刺激之间的复杂关系,这过度简化了声音感知的复杂性并限制了变形质量。相比之下,SoundMorpher基于梅尔频谱图探索变形因子与变形声音感知刺激之间的显式比例映射。该方法能够实现中间声音的更平滑过渡,并确保感知一致的转换,可轻松扩展至多样化的声音变形任务。此外,我们提出一套基于三个客观标准(对应性、感知中间性与平滑性)的量化指标,用于全面评估声音变形系统。我们通过大量实验证明SoundMorpher在真实场景中的有效性和多功能性,突显其在创意音乐作曲、影视后期制作和交互音频技术等应用领域的潜在影响。