We present SoundMorpher, an open-world sound morphing method designed to generate perceptually uniform morphing trajectories. Traditional sound morphing techniques typically assume a linear relationship between the morphing factor and sound perception, achieving smooth transitions by linearly interpolating the semantic features of source and target sounds while gradually adjusting the morphing factor. However, these methods oversimplify the complexities of sound perception, resulting in limitations in morphing quality. In contrast, SoundMorpher explores an explicit relationship between the morphing factor and the perception of morphed sounds, leveraging log Mel-spectrogram features. This approach further refines the morphing sequence by ensuring a constant target perceptual difference for each transition and determining the corresponding morphing factors using binary search. To address the lack of a formal quantitative evaluation framework for sound morphing, we propose a set of metrics based on three established objective criteria. These metrics enable comprehensive assessment of morphed results and facilitate direct comparisons between methods, fostering advancements in sound morphing research. Extensive experiments demonstrate the effectiveness and versatility of SoundMorpher in real-world scenarios, showcasing its potential in applications such as creative music composition, film post-production, and interactive audio technologies. Our demonstration and codes are available at~\url{https://xinleiniu.github.io/SoundMorpher-demo/}.
翻译:本文提出SoundMorpher,一种开放世界的声音变形方法,旨在生成感知均匀的变形轨迹。传统声音变形技术通常假设变形因子与声音感知之间存在线性关系,通过线性插值源声音与目标声音的语义特征并逐步调整变形因子来实现平滑过渡。然而,这些方法过度简化了声音感知的复杂性,导致变形质量存在局限。相比之下,SoundMorpher利用对数梅尔频谱图特征,探索了变形因子与变形后声音感知之间的显式关系。该方法通过确保每次过渡具有恒定的目标感知差异,并利用二分搜索确定相应的变形因子,进一步优化了变形序列。针对声音变形领域缺乏正式定量评估框架的问题,我们基于三个既定的客观标准提出了一套评估指标。这些指标能够全面评估变形结果,并促进不同方法之间的直接比较,从而推动声音变形研究的发展。大量实验证明了SoundMorpher在真实场景中的有效性和通用性,展示了其在创意音乐作曲、电影后期制作和交互式音频技术等应用中的潜力。我们的演示和代码可在~\url{https://xinleiniu.github.io/SoundMorpher-demo/} 获取。