Modern hardware compilers increasingly rely on rich intermediate representations (IRs) to preserve optimization-relevant semantics before generating RTL code. However, one important optimization is still largely deferred to backend tools: pipeline optimization. In common RTL flows, registers are inserted by frontend heuristics or hardware designers and later adjusted by backend retiming after the design has been lowered to a much lower-level netlist representation. At that point, much of the operator-level structure originally exposed by the compiler IR has already been weakened or lost, limiting opportunities for global, compiler-level pipeline optimization. This paper presents PipeRTL, an IR-level pipeline optimization framework for hardware compilers, instantiated in CIRCT. PipeRTL makes the legality of register relocation explicit in the IR, uses a learned timing predictor to approximate downstream delay behavior, and formulates timing-aware register relocation as a global min-cost flow problem under timing constraints. Evaluation on open-source designs under a commercial backend synthesis flow shows that PipeRTL improves downstream implementation quality on average, reducing critical-path delay, power, and area across the evaluated benchmarks, while also providing a stronger starting point for backend retiming. These results indicate that exposing pipeline optimization as an explicit compiler pass can deliver backend-meaningful gains by improving the sequential structure presented to later stages and the resulting downstream implementation quality.
翻译:现代硬件编译器日益依赖丰富的中间表示(IR)来保留与优化相关的语义信息,随后再生成RTL代码。然而,一项关键优化——流水线优化——仍大多被推迟至后端工具中处理。在常见的RTL流程中,寄存器由前端启发式算法或硬件设计者插入,并在设计被降级至更低层级网表表示后,由后端重定时调整。此时,编译器IR原本暴露的算子级结构大多已被削弱或丢失,限制了全局性的编译器级别流水线优化机会。本文提出PipeRTL,一个基于CIRCT实例化的硬件编译器IR级流水线优化框架。PipeRTL在IR中显式声明寄存器重定位的合法性,利用学习型时序预测器近似下游延迟行为,并将时序感知的寄存器重定位形式化为一个满足时序约束的全局最小费用流问题。在商用后端综合流程下对开源设计的评估表明,PipeRTL平均改善了下游实现质量,在评估基准上降低了关键路径延迟、功耗和面积,同时为后端重定时提供了更优的起点。这些结果表明,将流水线优化显式化为编译器通行(pass)可通过改善呈现给后续阶段的时序结构及其产生的下游实现质量,带来后端可感知的增益。