Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.
翻译:工业数据路径设计者将动态功耗视为关键指标。算术电路作为芯片总功耗的主要组成部分,常成为功耗优化的重点目标。尽管算术电路面积与动态功耗通常具有相关性,但两者之间也存在权衡:可通过添加额外门电路显式降低算术电路的活动性,从而减少功耗。本研究探讨两种功耗优化形式及其相互作用:通过算术优化缩减电路面积,以及采用数据门控和时钟门控消除冗余计算。通过将这两类优化编码为表达式的局部重写,我们的工具流可同时探索它们,并借助e-graph数据结构发现算术重写带来的全新节能机会。由于功耗高度依赖电路执行的工作负载,该工具流支持数据依赖设计范式——根据特定数据活动上下文自动定制实现方案。我们开发了ROVER自动化RTL到RTL优化框架,该框架接收电路输入激励并生成低功耗架构。在开源算术基准测试及源自英特尔产线实例的基准测试中,该工具可将总功耗降低最高达33.9%。