This paper presents an accuracy-enhanced Hybrid Temporal Computing (E-HTC) framework for ultra-low-power hardware accelerators with deterministic additions. Inspired by the recently proposed HTC architecture, which leverages pulse-rate and temporal data encoding to reduce switching activity and energy consumption but loses accuracy due to its multiplexer (MUX)-based scaled addition, we propose two bitstream addition schemes: (1) an Exact Multiple-input Binary Accumulator (EMBA), which performs precise binary accumulation, and (2) a Deterministic Threshold-based Scaled Adder (DTSA), which employs threshold logic for scaled addition. These adders are integrated into a multiplier accumulator (MAC) unit supporting both unipolar and bipolar encodings. To validate the framework, we implement two accelerators: a Finite Impulse Response (FIR) filter and an 8-point Discrete Cosine Transform (DCT)/iDCT engine. Results on a 4x4 MAC show that, in unipolar mode, E-HTC matches the RMSE of state-of-the-art Counter-Based Stochastic Computing (CBSC) MAC, improves accuracy by 94% over MUX-based HTC, and reduces power and area by 23% and 7% compared to MUX-based HTC and 64% and 74% compared to CBSC. In bipolar mode, E-HTC MAC achieves 2.09% RMSE -- an 83% improvement over MUX-based HTC -- and approaches CBSC's 1.40% RMSE with area and power savings of 28% and 43% vs. MUX-based HTC and about 76% vs. CBSC. In FIR experiments, both E-HTC variants yield PSNR gains of 3--5 dB (30--45% RMSE reduction) while saving 13% power and 3% area. For DCT/iDCT, E-HTC boosts PSNR by 10--13 dB (70--75% RMSE reduction) while saving area and power over both MUX- and CBSC-based designs.
翻译:本文提出了一种用于超低功耗硬件加速器的精度增强型混合时序计算(E-HTC)框架,该框架采用确定性加法运算。受近期提出的HTC架构启发——该架构利用脉冲速率和时序数据编码来降低开关活动与能耗,但因其基于多路复用器(MUX)的缩放加法导致精度损失——我们提出了两种比特流加法方案:(1)精确多输入二进制累加器(EMBA),执行精确的二进制累加;(2)基于确定性阈值的缩放加法器(DTSA),采用阈值逻辑实现缩放加法。这些加法器被集成到支持单极性与双极性编码的乘积累加(MAC)单元中。为验证该框架,我们实现了两种加速器:有限脉冲响应(FIR)滤波器和8点离散余弦变换(DCT)/iDCT引擎。在4x4 MAC上的实验结果表明,在单极性模式下,E-HTC达到了先进计数器型随机计算(CBSC)MAC的均方根误差(RMSE)水平,相比基于MUX的HTC精度提升94%,且功耗和面积分别比基于MUX的HTC降低23%和7%,比CBSC降低64%和74%。在双极性模式下,E-HTC MAC实现了2.09%的RMSE——相比基于MUX的HTC提升83%——并接近CBSC的1.40% RMSE,同时面积和功耗相比基于MUX的HTC节省28%和43%,相比CBSC节省约76%。在FIR实验中,两种E-HTC变体均获得3-5 dB的峰值信噪比(PSNR)增益(对应30-45%的RMSE降低),同时节省13%功耗和3%面积。对于DCT/iDCT,E-HTC将PSNR提升10-13 dB(对应70-75%的RMSE降低),并在面积和功耗上均优于基于MUX和CBSC的设计。