Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of looped transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10% of the parameter count.
翻译:变换器已被证明能够基于上下文解决来自各种(潜在)模型的数据拟合问题,如 Garg 等人所报告。然而,变换器架构中缺乏固有的迭代结构,这给模拟传统机器学习方法中常用的迭代算法带来了挑战。为了解决这一问题,我们提出利用循环变换器架构及其相关训练方法,旨在将迭代特性融入变换器架构中。实验结果表明,在解决各种数据拟合问题时,循环变换器在参数数量不到标准变换器10%的情况下,实现了与之相当的性能。