Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unrelated characteristics. Despite rapid progress in Speech Large Language Models (Speech LLMs), systematic evaluation of this capability remains challenging, as existing benchmarks are fragmented across isolated editing tasks. To bridge this gap, we introduce SpeechEditBench, a bilingual multi-attribute benchmark for instruction-guided speech editing. SpeechEditBench encompasses seven atomic editing tasks, as well as compositional editing tasks that integrate multiple operations within a single instruction. We propose an anchor-based evaluation protocol that separately assesses the edit success of target attributes and the preservation of untargeted attributes, leading to three metrics: target success, preservation success, and joint success. Using this benchmark, we evaluate mainstream Speech LLMs and specialized speech editing systems. The results reveal three key findings: (1) no single model performs well across all editing dimensions; (2) closed-source Speech LLMs generally outperform open-source models; (3) compositional editing remains highly challenging, with even the most advanced models struggling to achieve high joint success. SpeechEditBench provides a rigorous diagnostic framework to identify bottlenecks in Speech LLMs, thereby facilitating the development of next-generation Speech LLMs with more robust and precise instruction-guided editing capabilities. Data and code are avaialble at https://github.com/daxintan-cuhk/SpeechEditBench .
翻译:指令引导的语音编辑要求模型在修改指定语音属性的同时保持不相关特征不变。尽管语音大语言模型(Speech LLMs)发展迅速,但对此能力的系统评估仍具挑战性,因为现有基准测试分散于孤立的编辑任务中。为弥合这一差距,我们提出SpeechEditBench——一个面向指令引导语音编辑的双语多属性基准测试。SpeechEditBench包含七项原子编辑任务,以及在单一指令中融合多项操作的组合编辑任务。我们提出基于锚点的评估协议,分别评估目标属性的编辑成功率和非目标属性的保持率,导出三个指标:目标成功率、保持成功率和联合成功率。利用该基准测试,我们评估了主流语音大语言模型和专用语音编辑系统。结果揭示三项关键发现:(1) 尚无单个模型在所有编辑维度表现优异;(2) 闭源语音大语言模型总体优于开源模型;(3) 组合编辑仍极具挑战,即使最先进的模型也难以实现高联合成功率。SpeechEditBench提供了严格的诊断框架,用以识别语音大语言模型的瓶颈,从而推动具备更强健、更精确指令引导编辑能力的下一代语音大语言模型的发展。数据和代码已开源在 https://github.com/daxintan-cuhk/SpeechEditBench 。