This paper introduces a new paradigm of chip design for the semi-conductor industry called Data-Rich Analytics Based Computer Architecture (BRYT). The goal is to enable monitoring chip hardware behavior in the field, at real-time speeds with no slowdowns, with minimal power overheads and obtain insights on chip behavior and workloads. The paradigm is motivated by the end of Moore's Law and Dennard Scaling which necessitates architectural efficiency as the means for improved capability for the next decade or two. This paper implements the first version of the paradigm with a system architecture and the concept of an analYtics Processing Unit (YPU). We perform 4 case studies, and implement an RTL level prototype. Across the case studies we show a YPU with area overhead <3% at 7nm, and overall power consumption of <25 mW is able to create previously inconceivable data PICS stacks of arbitrary programs, evaluating instruction prefetchers in the wild before deployment, fine-grained cycle-by-cycle utilization of hardware modules, and histograms of tensor-value distributions of DL models.
翻译:本文提出一种面向半导体行业的新型芯片设计范式——基于数据丰富分析的计算机架构(BRYT)。其目标是在现场以实时速度监测芯片硬件行为,实现零延迟运行,将功耗开销降至最低,并获取芯片行为与工作负载的相关洞见。该范式的提出源于摩尔定律和登纳德缩放定律的终结——未来一二十年内,必须通过架构效率提升来实现算力升级。本文首次实现该范式的系统架构,并提出"分析处理单元"(YPU)概念。我们开展了4项案例研究并完成RTL级原型验证。研究表明:在7nm制程下,面积开销不足3%、整体功耗低于25mW的YPU,能够构建任意程序的、此前难以想象的数据PICS堆栈,实现指令预取器在部署前的实战评估、硬件模块的细粒度周期级利用率分析,以及深度学习模型张量值分布的直方图统计。