ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant's thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.

翻译：高性能计算（HPC）处理器已成为集成的信息物理系统，需要复杂且高带宽的闭环功耗与热控制策略。为高效满足实时多输入多输出（MIMO）最优功耗需求，高端处理器集成了片上功率控制器系统（PCS）。传统PCS基于简单微控制器（MCU）级内核，但需采用更具可扩展性与灵活性的PCS架构以支持高级MIMO控制算法，从而管理日益增多的内核数量、功率状态以及工艺、电压和温度变化。本文提出ControlPULP——一种开源软硬件RISC-V并行PCS平台，包含具备快速中断处理的单核MCU、可扩展多核可编程簇加速器以及专用DMA引擎，用于实时电源管理策略的并行加速。ControlPULP依托FreeRTOS调度响应式功率控制固件（PCF）应用层。我们以面向下一代72核HPC处理器的电源管理用例展示了ControlPULP的性能：首先证明多核簇可加速PCF执行，相较单核执行实现4.9倍加速比，在仅占用现代HPC CPU芯片面积约0.1%的极低面积开销下，使控制超周期内支持更高级的电源管理算法；其次通过设计基于FPGA的闭环仿真框架（利用异构SoC范式）评估PCS与PCF，相较于等效软件模型在环方法，实现了平均偏差在芯片热设计功耗（TDP）3%以内的DVFS跟踪；最后表明，在计算密集型工作负载下，所提PCF优于工业级控制算法。