The rapid advancement of AI workloads and domain-specific architectures has led to increasingly diverse processor microarchitectures, whose design exploration requires fast and accurate performance validation. However, traditional workflows defer validation process until RTL design and SoC integration are complete, significantly prolonging development and iteration cycle. In this work, we present FASE framework, FPGA-Assisted Syscall Emulation, the first work for adapt syscall emulation on FPGA platforms, enabling complex multi-thread benchmarks to directly run on the processor design without integrating SoC or target OS for early-stage performance validation. FASE introduces three key innovations to address three critical challenges for adapting FPGA-based syscall emulation: (1) only a minimal CPU interface is exposed, with other hardware components untouched, addressing the lack of a unified hardware interface in FPGA systems; (2) a Host-Target Protocol (HTP) is proposed to minimize cross-device data traffic, mitigating the low-bandwidth and high-latency communication between FPGA and host; and (3) a host-side runtime is proposed to remotely handle Linux-style system calls, addressing the challenge of cross-device syscall delegation. Experiments ware conducted on Xilinx FPGA with open-sourced RISC-V SMP processor Rocket. With single-thread CoreMark, FASE introduces less than 1% performance error and achieves over 2000x higher efficiency compared to Proxy Kernel due to FPGA acceleration. With complex OpenMP benchmarks, FASE demonstrates over 96% performance validation accuracy for most single-thread workloads and over 91.5% for most multi-thread workloads compared to full SoC validation, significantly reducing development complexity and time-to-feedback. All components of FASE framework are released as open-source.
翻译:随着AI工作负载和领域专用架构的快速发展,处理器微架构日趋多样化,其设计探索需要快速且准确的性能验证。然而,传统工作流将验证过程推迟至RTL设计与SoC集成完成之后,显著延长了开发与迭代周期。本文提出FASE框架(基于FPGA的系统调用仿真),这是首个在FPGA平台上适配系统调用仿真的工作,使复杂多线程基准测试程序无需集成SoC或目标操作系统即可直接在处理器设计上运行,实现早期性能验证。针对基于FPGA的系统调用仿真所面临的三个关键挑战,FASE引入三项核心创新:(1)仅暴露最小CPU接口,其他硬件组件保持不变,解决了FPGA系统缺乏统一硬件接口的问题;(2)提出主机-目标协议(HTP)以最小化跨设备数据流量,缓解FPGA与主机间低带宽、高延迟的通信瓶颈;(3)设计主机端运行时环境以远程处理类Linux系统调用,解决了跨设备系统调用委托的难题。在Xilinx FPGA平台上基于开源RISC-V SMP处理器Rocket进行的实验表明:对于单线程CoreMark基准测试,FASE引入的性能误差低于1%,且因FPGA加速,其效率相比代理内核提高超过2000倍;对于复杂OpenMP基准测试,与完整SoC验证相比,FASE对大多数单线程工作负载的性能验证准确率超过96%,对大多数多线程工作负载超过91.5%,显著降低了开发复杂度并缩短了反馈周期。FASE框架所有组件均已开源发布。