Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards, packaging, and security issues. We further discuss the software programming challenges in chiplet systems.
翻译:快速演进的人工智能算法(如大语言模型)正持续推动当今数据中心对计算能力需求的快速增长。采用领域专用架构的异构计算在系统纵向扩展和横向扩展中展现出诸多机遇。特别地,异构芯片组架构凭借其在系统扩展性、降低设计复杂度及规避传统单片芯片设计高成本等方面的优势备受青睐。然而,如何实现计算资源互连与异构芯片组协调管理成为成功的关键。本文首先探讨不同AI工作负载的多样性及其演进需求,阐述芯片组如何实现更优的成本效益与更短的市场响应周期。进而分析芯片组接口标准制定、封装工艺及安全性方面的挑战,并进一步讨论芯片组系统中的软件编程难题。