可靠AI加速器探索：跨层评估与设计优化 (The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization)

As the CMOS technology pushes to the nanoscale, aging effects and process variations have become increasingly pronounced, posing significant reliability challenges for AI accelerators. Traditional guardband-based design approaches, which rely on pessimistic timing margin, sacrifice significant performance and computational efficiency, rendering them inadequate for high-performance AI computing demands. Current reliability-aware AI accelerator design faces two core challenges: (1) the lack of systematic cross-layer analysis tools to capture coupling reliability effects across device, circuit, architecture, and application layers; and (2) the fundamental trade-off between conventional reliability optimization and computational efficiency. To address these challenges, this paper systematically presents a series of reliability-aware accelerator designs, encompassing (1) aging and variation-aware dynamic timing analyzer, (2) accelerator dataflow optimization using critical input pattern reduction, and (3) resilience characterization and novel architecture design for large language models (LLMs). By tightly integrating cross-layer reliability modeling and AI workload characteristics, these co-optimization approaches effectively achieve reliable and efficient AI acceleration.

翻译：随着CMOS技术进入纳米尺度，老化效应与工艺偏差日益显著，为AI加速器带来了严峻的可靠性挑战。传统基于防护带的设计方法依赖保守的时序裕量，牺牲了显著的性能与计算效率，已无法满足高性能AI计算的需求。当前可靠性感知的AI加速器设计面临两大核心挑战：（1）缺乏系统化的跨层分析工具以捕捉器件、电路、架构与应用层间的耦合可靠性效应；（2）传统可靠性优化与计算效率之间存在根本性权衡。为应对这些挑战，本文系统性地提出了一系列可靠性感知的加速器设计方案，包括：（1）老化与偏差感知的动态时序分析器；（2）基于关键输入模式缩减的加速器数据流优化；（3）面向大语言模型（LLMs）的容错特性刻画与新型架构设计。通过紧密整合跨层可靠性建模与AI工作负载特性，这些协同优化方法有效实现了可靠且高效的AI加速。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《防务领域人工智能可信赖性：为防务开发负责任、符合伦理且可信赖的AI系统》欧洲防务局2025最新107页

专知会员服务

22+阅读 · 2025年5月14日

《面向边缘AI应用的高性能高能效架构探索》156页

专知会员服务

34+阅读 · 2025年4月12日

再谈工业AI：立足跨模型架构AI中台，落地垂类Agent场景

专知会员服务

44+阅读 · 2025年3月9日

边缘AI行业深度：边缘AI硬件，引领硬件创新时代

专知会员服务

51+阅读 · 2024年4月18日