Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

翻译：基于Transformer的架构在复杂符号序列生成方面取得了显著进展，但在实现离散信号属性的细粒度、可解释控制方面仍存在明显差距。本文研究了多轨音乐Transformer（MMT）的机械可解释性，并提出了一种无需重新训练即可实现确定性属性调制的框架，通过推理时激活引导来弥合这一差距。利用均值差（DiffMean）方法，我们在残差流中分离出针对信号属性（具体为音高和时长）的潜在方向。验证了该领域中的线性表征假设，实现了引导幅度与属性偏移之间的高相关性。为解决多属性引导中固有的特征纠缠问题，我们引入了一种基于Gram-Schmidt正交化的双重引导框架。实验结果表明，与简单的向量加法相比，这种几何解耦方法减少了概念干扰和信号退化，即使面对强自回归条件也能实现独立的确定性控制。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【ICML 2026】用测试时训练线性化视觉Transformer：T⁵ 实现 Softmax 注意力到线性复杂度的快速转换

专知会员服务

5+阅读 · 5月5日

【NeurIPS2025】Seg4Diff：揭示文本到图像扩散 Transformer 中的开放词汇分割

专知会员服务

10+阅读 · 2025年9月23日

ICML 2025 关于语言模型机械可解释性的教程

专知会员服务

18+阅读 · 2025年7月25日

多模态基础模型的机制可解释性综述

专知会员服务

43+阅读 · 2025年2月28日