Controllable music editing is to modify high-level attributes while strictly preserving rhythmic and melodic structures. However, this task is challenged by a semantic-structural entanglement: steering methods often degrade structure to achieve editing performance, while structural adaptors suppress semantic responsiveness. We propose AnchorSteer, a framework that disentangles this tension by coupling structural anchoring with self-discovered semantic steering. The proposed approach probes internal representations to extract interpretable, label-free concept vectors via a self-supervised reconstruction objective, isolating attributes without curated data. During editing, these portable, plug-and-play concept vectors are injected into diffusion hidden manifolds while a structural adaptor enforces consistency. Variants for unconditioned and conditioned injections are provided to balance robustness and semantic strength. Experiments on ZoME-Bench and subjective tests show that the proposed framework outperforms both steering-only and anchoring-only baselines, enabling significant semantic transformations with high-fidelity structural preservation.
翻译:可控音乐编辑旨在修改高级属性,同时严格保留节奏与旋律结构。然而,该任务面临语义-结构纠缠的挑战:引导方法常以牺牲结构完整性为代价提升编辑效果,而结构适配器则抑制语义响应性。本文提出AnchorSteer框架,通过将结构锚定与自我发现式语义引导相结合以解耦该矛盾。该方法探针内部表示,基于自监督重建目标提取可解释、无标签的概念向量,从而在无需人工标注数据的情况下分离属性。编辑时,这些便携式即插即用概念向量被注入扩散隐空间,同时由结构适配器强制保持一致性。针对无条件和条件注入场景,分别提供变体以平衡鲁棒性与语义强度。在ZoME-Bench基准测试与主观评估中的实验表明,所提框架优于纯引导与纯锚定基线方法,能够在高保真结构保留前提下实现显著的语义变换。