The success of large-scale language models like GPT can be attributed to their ability to efficiently predict the next token in a sequence. However, these models rely on constant computational effort regardless of the complexity of the token they are predicting, lacking the capacity for iterative refinement. In this paper, we introduce a novel Loop Neural Network, which achieves better performance by utilizing longer computational time without increasing the model size. Our approach revisits the input multiple times, refining the prediction by iteratively looping over a subset of the model with residual connections. We demonstrate the effectiveness of this method through experiments comparing versions of GPT-2 with our loop models, showing improved performance in language modeling tasks while maintaining similar parameter counts. Importantly, these improvements are achieved without the need for extra training data.
翻译:GPT等大规模语言模型的成功可归因于其高效预测序列中下一个标记的能力。然而,无论预测标记的复杂度如何,这些模型均依赖恒定的计算量,缺乏迭代优化的能力。本文提出一种新颖的循环神经网络,通过在不增加模型规模的前提下利用更长的计算时间,实现了更优的性能。我们的方法通过对模型子集进行带残差连接的迭代循环,多次重新处理输入以优化预测结果。通过将GPT-2的不同版本与我们的循环模型进行对比实验,我们证明了该方法在语言建模任务中能够以相近的参数数量实现性能提升。值得注意的是,这些改进无需额外训练数据即可达成。