Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, and defense. Massive-scale AI models, which mimic the human brain's functionality, typically feature millions and even billions of parameters through data-intensive matrix multiplication tasks. While conventional Von-Neumann architectures struggle with the memory wall and the end of Moore's Law, these AI applications are migrating rapidly towards the edge, such as in robotics and unmanned aerial vehicles for surveillance, thereby adding more constraints to the hardware budget of AI architectures at the edge. Although in-memory computing has been proposed as a promising solution for the memory wall, both analog and digital in-memory computing architectures suffer from substantial degradation of the proposed benefits due to various design limitations. We propose a new digital in-memory stochastic computing architecture, DISCA, utilizing a compressed version of the quasi-stochastic Bent-Pyramid data format. DISCA inherits the same computational simplicity of analog computing, while preserving the same scalability, productivity, and reliability of digital systems. Post-layout modeling results of DISCA show an energy efficiency of 3.59TOPS/W per bit at 500 MHz using a commercial 180 nm CMOS technology. Therefore, DISCA significantly improves the energy efficiency for matrix multiplication workloads by orders of magnitude if scaled and compared to its counterpart architectures.
翻译:当今,我们正经历着一场人工智能革命,它主导着医疗、机器人、汽车、安防和国防等多个应用领域的技术格局。大规模AI模型通过数据密集型矩阵乘法任务模拟人脑功能,通常包含数百万甚至数十亿参数。当传统冯·诺依曼架构受困于存储墙和摩尔定律终结时,这些AI应用正迅速向边缘迁移(例如用于监控的机器人和无人机),从而对边缘AI架构的硬件预算施加了更多限制。尽管存内计算被提出作为应对存储墙的有前景方案,但模拟和数字存内计算架构均因各类设计限制而导致其预期优势大幅下降。我们提出了一种新型数字存内随机计算架构DISCA,该架构采用准随机弯金字塔数据格式的压缩版本。DISCA继承了模拟计算同等的计算简洁性,同时保持了数字系统的可扩展性、生产力和可靠性。基于180纳米商用CMOS技术的DISCA版图后建模结果显示,其在500MHz频率下能效达到3.59TOPS/W每比特。因此,与同类架构相比,DISCA在尺度缩放后可显著提升矩阵乘法工作负载的能量效率达数个数量级。