The rapid growth of Large Language Models (LLMs) has intensified the need for specialized hardware accelerators that can satisfy stringent inference latency and power constraints. Although matrix multiplications dominate the overall computational workload, non-linear vector normalization operations, such as LayerNorm, RMSNorm and Softmax can become critical hardware bottlenecks. Existing accelerators typically implement these functions using dedicated hardware blocks, leading to duplicated resources and inefficient silicon utilization. To address this limitation, we propose a Minimalist Integer Vector Engine (MIVE), a programmable architecture capable of executing all three operations within a unified datapath. By exploiting common computational patterns across LayerNorm, RMSNorm and Softmax the proposed vector engine maximizes hardware sharing while reducing implementation overhead. Physical ASIC implementation results show that MIVE provides comprehensive multi-function support while achieving higher area and hardware efficiency than most state-of-the-art standalone accelerators.
翻译:大型语言模型(LLM)的快速发展加剧了对专用硬件加速器的需求,这类加速器需满足严格的推理延迟和功耗约束。尽管矩阵乘法主导整体计算负载,但非线性向量归一化操作(如LayerNorm、RMSNorm和Softmax)可能成为关键硬件瓶颈。现有加速器通常采用专用硬件模块实现这些功能,导致资源重复与硅利用率低下。为解决该局限,我们提出MIVE(极简整数向量引擎)——一种可在统一数据通路内执行全部三种操作的可编程架构。通过挖掘LayerNorm、RMSNorm和Softmax之间的共性计算模式,所提出向量引擎最大化硬件共享的同时降低了实现开销。物理ASIC实现结果表明,MIVE在提供全面多函数支持的同时,实现了比多数现有独立加速器更高的面积效率与硬件效率。