This paper explores the potential of recurrent neural networks (RNNs) and other subquadratic architectures as competitive alternatives to transformer-based models in low-resource language modeling scenarios. We utilize HGRN2 (Qin et al., 2024), a recently proposed RNN-based architecture, and comparatively evaluate its effectiveness against transformer-based baselines and other subquadratic architectures (LSTM, xLSTM, Mamba). Our experimental results show that BABYHGRN, our HGRN2 language model, outperforms transformer-based models in both the 10M and 100M word tracks of the challenge, as measured by their performance on the BLiMP, EWoK, GLUE and BEAR benchmarks. Further, we show the positive impact of knowledge distillation. Our findings challenge the prevailing focus on transformer architectures and indicate the viability of RNN-based models, particularly in resource-constrained environments.
翻译:本文探讨了循环神经网络(RNNs)及其他次二次复杂度架构,在低资源语言建模场景中作为基于Transformer模型的竞争性替代方案的潜力。我们采用近期提出的基于RNN的架构HGRN2(Qin等人,2024),并将其与基于Transformer的基线模型及其他次二次复杂度架构(LSTM、xLSTM、Mamba)进行了对比评估。实验结果表明,我们的HGRN2语言模型BabyHGRN,在BLiMP、EWoK、GLUE和BEAR基准测试上的表现,均优于基于Transformer的模型,无论是在该挑战的1000万词条还是1亿词条任务轨道上。此外,我们展示了知识蒸馏的积极影响。我们的研究结果对当前集中于Transformer架构的主流趋势提出了挑战,并表明了基于RNN的模型,特别是在资源受限环境中的可行性。