In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. However, race logic is limited in its applications due to inherent restrictions. The new HTC framework overcomes these limitations by encoding signals in both temporal and pulse rate formats for multiplication and in temporal format for propagation. This approach maintains reduced switch energy while being general enough to implement a wide range of arithmetic operations. We demonstrate how HTC multiplication is performed for both unipolar and bipolar data encoding and present the basic designs for multipliers, adders, and MAC units. Additionally, we implement two hardware accelerators: a Finite Impulse Response (FIR) filter and a Discrete Cosine Transform (DCT)/iDCT engine for image compression and DSP applications. Experimental results show that the HTC MAC has a significantly smaller power and area footprint compared to the Unary MAC design and is orders of magnitude faster. Compared to the CBSC MAC, the HTC MAC reduces power consumption by $45.2\%$ and area footprint by $50.13\%$. For the FIR design, the HTC design significantly outperforms the Unary design on all metrics. Compared to the CBSC design, the HTC-based FIR filter reduces power consumption by $36.61\%$ and area cost by $45.85\%$. The HTC-based DCT filter retains the quality of the original image with a decent PSNR, while consuming $23.34\%$ less power and occupying $18.20\%$ less area than the CBSC MAC-based DCT filter.
翻译:本文提出了一种新型混合时序计算框架,该框架通过结合脉冲速率编码与时序数据编码来设计超低能耗硬件加速器。我们的方法受到近期提出的时序计算(亦称竞态逻辑)的启发,该技术将数据值编码为单一延迟,通过最小化信号切换实现显著降低能耗。然而,竞态逻辑因其固有局限性导致应用范围受限。新型混合时序计算框架通过以下方式克服这些限制:在乘法运算中同时使用时序编码和脉冲速率编码表示信号,在信号传播中仅使用时序编码。该方法在保持低切换能耗的同时,具备足够的通用性以实现广泛的算术运算。我们阐述了混合时序计算框架在单极性与双极性数据编码下的乘法实现方式,并展示了乘法器、加法器及乘累加单元的基础设计。此外,我们实现了两种硬件加速器:用于图像压缩与数字信号处理的有限脉冲响应滤波器,以及离散余弦变换/逆离散余弦变换引擎。实验结果表明:相较于单极编码乘累加设计,混合时序计算乘累加单元在功耗与面积上显著降低,且运算速度提升数个数量级;相较于CBSC乘累加设计,混合时序计算乘累加单元功耗降低45.2%,面积减少50.13%。在有限脉冲响应滤波器设计中,混合时序计算方案在所有指标上均显著优于单极编码设计;相较于CBSC设计,基于混合时序计算的有限脉冲响应滤波器功耗降低36.61%,面积成本减少45.85%。基于混合时序计算的离散余弦变换滤波器在保持原始图像质量(峰值信噪比表现良好)的同时,相较于基于CBSC乘累加的离散余弦变换滤波器,功耗降低23.34%,面积占用减少18.20%。