Modular addition tasks serve as a useful test bed for observing empirical phenomena in deep learning, including the phenomenon of \emph{grokking}. Prior work has shown that one-layer transformer architectures learn Fourier Multiplication circuits to solve modular addition tasks. In this paper, we show that Recurrent Neural Networks (RNNs) trained on modular addition tasks also use a Fourier Multiplication strategy. We identify low rank structures in the model weights, and attribute model components to specific Fourier frequencies, resulting in a sparse representation in the Fourier space. We also show empirically that the RNN is robust to removing individual frequencies, while the performance degrades drastically as more frequencies are ablated from the model.
翻译:模加任务作为观察深度学习经验现象(包括“顿悟”现象)的有用测试平台。先前研究表明,单层Transformer架构通过傅里叶乘法电路来学习解决模加任务。本文证明,在模加任务上训练的循环神经网络同样采用傅里叶乘法策略。我们识别出模型权重中的低秩结构,并将模型组件归因于特定傅里叶频率,从而在傅里叶空间中形成稀疏表示。实验表明,RNN对移除单个频率具有鲁棒性,但随着从模型中消除的频率增多,性能会急剧下降。