Designing and validating efficient cache-coherent memory subsystems is a critical yet complex task in the development of modern multi-core system-on-chip architectures. Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems. On the design side, Rhea generates synthesizable, highly configurable RTL supporting various architectural parameters. On the validation side, Rhea integrates Verilator's cycle-accurate RTL simulation with gem5's full-system simulation, allowing realistic workloads and operating systems to run alongside the actual RTL under test. We apply Rhea to design MSI-based RTL memory subsystems with one and two levels of private caches and scaling up to sixteen cores. Their evaluation with 22 applications from state-of-the-art benchmark suites shows intermediate performance relative to gem5 Ruby's MI and MOESI models. The hybrid gem5-Verilator co-simulation flow incurs a moderate simulation overhead, up to 2.7 times compared to gem5 MI, but achieves higher fidelity by simulating real RTL hardware. This overhead decreases with scale, down to 1.6 times in sixteen-core scenarios. These results demonstrate Rhea's effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.
翻译:设计与验证高效的缓存一致性内存子系统是现代多核片上系统架构开发中至关重要却又极其复杂的任务。Rhea是一个统一的框架,旨在简化和加速RTL缓存一致性内存子系统的设计与系统级验证。在设计方面,Rhea能够生成支持多种架构参数、可综合且高度可配置的RTL代码。在验证方面,Rhea将Verilator的周期精确RTL仿真与gem5的全系统仿真相结合,使得实际工作负载和操作系统能够在被测真实RTL硬件上同时运行。我们应用Rhea设计了基于MSI协议、具有一级和二级私有缓存并可扩展至十六核的RTL内存子系统。使用来自前沿基准测试套件的22个应用程序进行评估,结果显示其性能介于gem5 Ruby的MI模型与MOESI模型之间。这种gem5-Verilator混合协同仿真流程会产生适度的仿真开销,与gem5 MI模型相比最高达2.7倍,但通过仿真真实RTL硬件实现了更高的保真度。该开销随系统规模扩大而降低,在十六核场景下可降至1.6倍。这些结果证明了Rhea在实现RTL缓存一致性内存子系统设计快速开发方面的有效性与可扩展性。