Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards, packaging, and security issues. We further discuss the software programming challenges in chiplet systems.
翻译:飞速演进的人工智能算法,如大型语言模型,正不断推动当今数据中心对计算需求的日益增长。采用领域专用架构的异构计算,在扩展与扩充计算系统时带来了诸多机遇。尤其值得关注的是,异构芯粒架构因有助于系统的扩展与扩充,同时能降低传统单片芯片设计带来的设计复杂性与成本,而备受青睐。然而,如何互连计算资源并协调异构芯粒的成功运作至关重要。本文首先探讨了不同人工智能工作负载的多样性及不断演进的需求,阐述了芯粒如何实现更优的成本效益与更短的市场周期。接着,我们讨论了建立芯粒接口标准、封装及安全方面面临的挑战,并进一步探讨了芯粒系统中的软件编程挑战。