Profiling is important for performance optimization by providing real-time observations and measurements of important parameters of hardware execution. Existing profiling tools for High-Level Synthesis (HLS) IPs running on FPGAs are far less mature compared with those developed for fixed CPU and GPU architectures and they still lag behind mainly due to their dynamic architecture. This limitation is reflected in the typical approach of extracting monitoring signals off of an FPGA device individually from dedicated ports, using one BRAM per signal for temporary information storage, or embedding vendor specific primitives to manually analyze the waveform. In this paper, we propose a systematic profiling method tailored to the dynamic nature of FPGA systems, particularly suitable for streaming accelerators. Instead of relying on signal extraction, the proposed profiling stream flows alongside the actual data, dynamically splitting and merging in synchrony with the data stream, and is ultimately directed to the processing system (PS) side. We conducted a preliminary evaluation of this method on randomly interconnected neural networks (RINNs) using the FIFO fullness metric, with co-simulation results for validation.
翻译:性能分析通过提供硬件执行关键参数的实时观测与测量,对性能优化至关重要。与针对固定CPU和GPU架构开发的成熟工具相比,现有面向FPGA上运行的高层次综合(HLS)IP核的性能分析工具远未成熟,其主要滞后原因在于FPGA的动态架构特性。这一局限性体现在典型方法中:通过专用端口从FPGA器件单独提取监测信号,每个信号使用一个BRAM进行临时信息存储,或嵌入厂商特定原语进行手动波形分析。本文提出一种针对FPGA系统动态特性定制的系统化性能分析方法,特别适用于流式加速器。该方法不依赖信号提取,而是让分析流与实际数据流并行传输,随数据流动态分合,最终导向处理系统(PS)端。我们使用FIFO占用率指标在随机互连神经网络(RINNs)上对该方法进行了初步评估,并通过协同仿真结果进行了验证。