Recursive (looped) Transformers decouple computational depth from parameter depth by repeatedly applying shared layers, providing an explicit architectural primitive for iterative refinement and latent reasoning. However, early looped Transformers often underperform non-recursive baselines of equal compute. While recent literature has introduced more effective recursion mechanisms to mitigate this gap, existing architectures still operate at a fixed, full-token resolution, neglecting the potential efficiency of computing over compressed latent representations. In this paper, we propose SpiralFormer, a looped Transformer that executes recurrence under a multi-resolution recursion schedule. We provide probing evidence that multi-resolution recursion enables the model to learn hierarchical dependencies by inducing iteration-wise functional specialization across different scales. Empirically, SpiralFormer achieves better parameter and compute efficiency than both looped and non-looped baselines across model scales from 160M to 1.4B, establishing sequence resolution as a potential axis for scaling recursive architectures.
翻译:递归(循环)Transformer通过重复应用共享层解耦计算深度与参数深度,为迭代优化和潜在推理提供了显式架构基元。然而,早期循环Transformer的性能往往低于同等计算量的非递归基线模型。尽管近期文献引入了更有效的递归机制以缓解这一差距,现有架构仍以固定全分辨率方式运行,忽视了在压缩潜在表示上进行计算的潜在效率。本文提出SpiralFormer——一种在多分辨率递归调度下执行循环的循环Transformer。我们通过探针实验证明,多分辨率递归通过诱导不同尺度上的迭代级功能特化,使模型能够学习层次依赖。实验表明,在1.6亿至14亿参数规模的模型中,SpiralFormer在参数和计算效率上均优于循环与非循环基线模型,确立了序列分辨率作为扩展递归架构的潜在维度。