State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture. Recent studies show that SSMs can match or surpass Transformers on code understanding tasks, such as code retrieval, when trained under similar conditions. However, their internal mechanisms remain a black box. We present the first systematic analysis of what SSM-based code models actually learn and perform the first comparative analysis of SSM and Transformer-based code models. Our analysis reveals that SSMs outperform Transformers at capturing code syntax and semantics in pretraining but forgets certain syntactic and semantic relations during fine-tuning on task, especially when the task emphasizes short-range dependencies. To diagnose this, we introduce SSM-Interpret, a frequency-domain framework that exposes a spectral shift toward short-range dependencies during fine-tuning. Guided by these findings, we propose architectural modifications that significantly improve the performance of SSM-based code model, validating that our analysis directly enables better models.
翻译:状态空间模型(SSMs)已成为Transformer架构的高效替代方案。近期研究表明,在相似训练条件下,SSMs在代码理解任务(如代码检索)上可达到或超越Transformer的性能。然而,其内部机制仍是一个黑箱。本文首次系统分析了基于SSM的代码模型实际学习的内容,并对SSM与基于Transformer的代码模型进行了首次比较分析。我们的分析表明,SSMs在预训练阶段捕捉代码语法和语义的能力优于Transformer,但在针对特定任务进行微调时会遗忘部分语法和语义关系,尤其是在任务强调短程依赖时。为诊断此现象,我们提出了SSM-Interpret——一种频域分析框架,该框架揭示了微调过程中模型频谱向短程依赖的偏移。基于这些发现,我们提出的架构改进显著提升了基于SSM的代码模型性能,验证了我们的分析能够直接指导构建更优模型。