Transformers have demonstrated effectiveness in \emph{in-context solving} data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of \emph{looped} transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10\% of the parameter count.
翻译:变换器已在多种(潜在)模型的数据拟合问题中展现出上下文求解的有效性,如Garg等人所报道。然而,变换器架构缺乏固有的迭代结构,这给模拟传统机器学习方法中常用的迭代算法带来了挑战。为解决这一问题,我们提出采用循环变换器架构及其相关训练方法,旨在将迭代特性融入变换器架构中。实验结果表明,循环变换器在解决多种数据拟合问题时,性能与标准变换器相当,而参数量不足其10%。