In this study, we present SingVisio, an interactive visual analysis system that aims to explain the diffusion model used in singing voice conversion. SingVisio provides a visual display of the generation process in diffusion models, showcasing the step-by-step denoising of the noisy spectrum and its transformation into a clean spectrum that captures the desired singer's timbre. The system also facilitates side-by-side comparisons of different conditions, such as source content, melody, and target timbre, highlighting the impact of these conditions on the diffusion generation process and resulting conversions. Through comprehensive evaluations, SingVisio demonstrates its effectiveness in terms of system design, functionality, explainability, and user-friendliness. It offers users of various backgrounds valuable learning experiences and insights into the diffusion model for singing voice conversion.
翻译:本研究提出SingVisio,一种面向扩散模型在歌声转换中可解释性的交互式可视分析系统。该平台通过可视化扩散模型的生成过程,展示含噪频谱逐步去噪并转化为捕捉目标歌手音色的纯净频谱的完整流程。系统支持源内容、旋律与目标音色等不同条件参数的并排对比,揭示这些条件对扩散生成过程及最终转换效果的影响机制。经过系统性评估,SingVisio在系统架构、功能设计、可解释性与用户友好性方面展现出卓越效能,为不同背景用户理解歌声转换扩散模型提供了宝贵的学习经验与深度洞察。