Transformers have revolutionized deep learning and generative modeling to enable unprecedented advancements in natural language processing tasks and beyond. However, designing hardware accelerators for executing transformer models is challenging due to the wide variety of computing kernels involved in the transformer architecture. Existing accelerators are either inadequate to accelerate end-to-end transformer models or suffer notable thermal limitations. In this paper, we propose the design of a three-dimensional heterogeneous architecture referred to as HeTraX specifically optimized to accelerate end-to-end transformer models. HeTraX employs hardware resources aligned with the computational kernels of transformers and optimizes both performance and energy. Experimental results show that HeTraX outperforms existing state-of-the-art by up to 5.6x in speedup and improves EDP by 14.5x while ensuring thermally feasibility.
翻译:Transformer模型已彻底改变深度学习和生成式建模,在自然语言处理任务及其他领域实现了前所未有的进展。然而,由于Transformer架构涉及的计算内核种类繁多,为其设计硬件加速器具有挑战性。现有加速器要么无法充分加速端到端Transformer模型,要么存在显著的热限制。本文提出一种称为HeTraX的三维异构架构设计,专门针对端到端Transformer模型的加速进行优化。HeTraX采用与Transformer计算内核相匹配的硬件资源,同时优化性能和能耗。实验结果表明,HeTraX在加速比上最高超越现有先进方案5.6倍,并将EDP提升14.5倍,同时确保热可行性。