Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.
翻译:基于Transformer的架构在复杂符号序列生成方面取得了显著进展,但在实现离散信号属性的细粒度、可解释控制方面仍存在明显差距。本文研究了多轨音乐Transformer(MMT)的机械可解释性,并提出了一种无需重新训练即可实现确定性属性调制的框架,通过推理时激活引导来弥合这一差距。利用均值差(DiffMean)方法,我们在残差流中分离出针对信号属性(具体为音高和时长)的潜在方向。验证了该领域中的线性表征假设,实现了引导幅度与属性偏移之间的高相关性。为解决多属性引导中固有的特征纠缠问题,我们引入了一种基于Gram-Schmidt正交化的双重引导框架。实验结果表明,与简单的向量加法相比,这种几何解耦方法减少了概念干扰和信号退化,即使面对强自回归条件也能实现独立的确定性控制。