Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of looped transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10% of the parameter count.
翻译:Garg等人报告称,Transformer已在多种(潜在)模型的数据拟合问题的上下文求解中展现出有效性。然而,Transformer架构缺乏固有的迭代结构,这使其在模拟传统机器学习方法中常用的迭代算法时面临挑战。为解决这一问题,我们提出使用循环式Transformer架构及其相关训练方法,旨在将迭代特性融入Transformer架构。实验结果表明,在解决多种数据拟合问题时,循环式Transformer在参数数量不足10%的情况下,仍能达到与标准Transformer相当的性能。