Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. As systems evolve toward chiplet-based architectures, pre-silicon validation of tightly coupled CPU-GPU subsystems becomes increasingly challenging due to complex validation framework setup, large design scale, high concurrency, non-deterministic execution, and intricate protocol interactions at chiplet boundaries, often resulting in long integration cycles. This paper presents a replay-driven validation methodology developed during the integration of a CPU subsystem, multiple Xe GPU cores, and a configurable Network-on-Chip (NoC) within a foundational SoC building block targeting the ODIN integrated chiplet architecture. By leveraging deterministic waveform capture and replay across both simulation and emulation using a single design database, complex GPU workloads and protocol sequences can be reproduced reliably at the system level. This approach significantly accelerates debug, improves integration confidence, and enables end-to-end system boot and workload execution within a single quarter, demonstrating the effectiveness of replay-based validation as a scalable methodology for chiplet-based systems.
翻译:CPU与GPU技术的集成是现代人工智能与图形处理负载的关键推动因素,它将面向控制的处理能力与大规模并行计算能力相结合。随着系统向基于芯粒的架构演进,紧密耦合的CPU-GPU子系统的流片前验证变得日益复杂,这源于验证框架设置的复杂性、庞大的设计规模、高并发性、非确定性执行以及芯粒边界处复杂的协议交互,往往导致漫长的集成周期。本文提出了一种重放驱动的验证方法,该方法是在面向ODIN集成芯粒架构的基础SoC构建模块中,集成CPU子系统、多个Xe GPU核心以及可配置片上网络时开发的。通过在仿真与模拟中利用单一设计数据库进行确定性波形捕获与重放,复杂的GPU工作负载与协议序列能够在系统级别可靠复现。该方法显著加速了调试进程,提升了集成置信度,并实现了在单个季度内完成端到端的系统启动与工作负载执行,证明了基于重放的验证作为一种可扩展方法对于芯粒架构系统的有效性。