面向AI工作负载的消息驱动自适应向量计算（MAVeC）加速器 (Messaging-based Adaptive Vector Computing (MAVeC) Accelerator for AI Workloads)

The performance of AI accelerators is increasingly limited by data movement, memory access, and orchestration overheads rather than raw compute capability. This paper presents MAVeC, a messaging-based adaptive vector computing accelerator designed to support streaming execution and runtime configurability for AI workloads. MAVeC replaces centralized control with a message-driven execution model in which data and control propagate together across distributed hardware elements, enabling autonomous execution, flexible routing, and efficient coordination. We validate MAVeC's core hardware constructs and execution model using matrix multiplication and convolution workloads under a cycle-accurate, system-level ASIC design in TSMC 28 nm, capturing computation, communication, and reduction. MAVeC sustains greater than 97 percent array utilization across hardware scales and problem sizes by translating spatial capacity into effective computation. Once inputs are brought in, over 90 percent of communication remains on-chip through coordinated temporal reuse, spatial multicast, and on-fabric partial-sum reduction. On a 64x64 SiteO array, MAVeC sustains over 5 TFLOPs per second while reducing end-to-end latency. Compared to TPU-style systolic arrays and MEISSA under compute-centric models, MAVeC achieves 1.5-2x lower latency. When evaluated against optimized NVIDIA H100 FP32 kernels, MAVeC sustains 5.8-6.1 TFLOPs per second, delivering a consistent 6.0-7.2x throughput advantage across problem sizes. Energy results show that MAVeC converts higher instantaneous power into lower total energy by shortening execution time and amortizing data movement. These results demonstrate that message-driven execution provides an effective architectural foundation for overcoming data movement and orchestration bottlenecks, enabling scalable, high-utilization accelerators for future AI workloads.

翻译：AI加速器的性能日益受到数据移动、内存访问和调度开销的限制，而非原始计算能力。本文提出MAVeC，一种基于消息传递的自适应向量计算加速器，旨在支持AI工作负载的流式执行和运行时可配置性。MAVeC采用消息驱动的执行模型取代集中式控制，使数据与控制共同在分布式硬件单元间传播，从而实现自主执行、灵活路由和高效协同。我们通过矩阵乘法和卷积工作负载，在TSMC 28 nm工艺的周期精确系统级ASIC设计下验证了MAVeC的核心硬件架构与执行模型，完整捕获了计算、通信和规约过程。MAVeC通过将空间容量转化为有效计算，在不同硬件规模和问题尺寸下均保持超过97%的阵列利用率。一旦输入数据载入，超过90%的通信通过协调的时间复用、空间组播和片上部分和规约保持在芯片内部。在64x64 SiteO阵列上，MAVeC持续实现每秒超过5 TFLOPs的算力，同时降低端到端延迟。相较于以计算为中心的TPU式脉动阵列和MEISSA架构，MAVeC实现了1.5-2倍的延迟降低。与优化的NVIDIA H100 FP32核函数相比，MAVeC持续保持每秒5.8-6.1 TFLOPs的算力，在不同问题尺寸下均提供6.0-7.2倍的稳定吞吐量优势。能效结果表明，MAVeC通过缩短执行时间并分摊数据移动开销，将更高的瞬时功耗转化为更低的总能耗。这些结果证明，消息驱动执行为克服数据移动与调度瓶颈提供了有效的架构基础，能够为未来AI工作负载构建可扩展、高利用率的加速器。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【新书】AI驱动的开发者：使用ChatGPT和Copilot构建出色的软件

专知会员服务

48+阅读 · 2024年9月23日

算力报告：算力供需双向走强，AI催化Infra建设新征程

专知会员服务

35+阅读 · 2024年9月7日

【CVPR2023】面向自监督视觉表示学习的混合自编码器

专知会员服务

25+阅读 · 2023年4月3日

【ChatGPT系列报告】AIGC行业深度报告：ChatGPT：加速计算服务器时代到来，36页ppt

专知会员服务

86+阅读 · 2023年3月10日