Advanced driver-assistance systems (ADAS) require neural compute engines that deliver low-latency inference under strict power and area constraints. Posit arithmetic is attractive for such accelerators because it provides high numerical fidelity at low precision, but its variable-length regime encoding increases encode/decode cost and exposes the datapath to large regime-field fault effects. This paper presents EULER-ADAS, a SIMD-enabled logarithmic bounded-Posit neural compute engine for energyefficient and reliability-aware ADAS acceleration. The proposed datapath combines bounded-regime Posit representation, stageadaptive logarithmic mantissa multiplication with bit truncation, and a SIMD-shared quire accumulation path supporting Posit- (8,0), Posit-(16,1), and Posit-(32,2) execution. The unified architecture enables 4xPosit-8, 2xPosit-16, or 1xPosit-32 operation without duplicating precision-specific hardware. FPGA implementation shows that the proposed configurations reduce LUT count by up to 41.4%, delay by up to 76.1%, and power by up to 71.9% relative to exact Posit neural compute engines, while achieving up to 10x lower energy-delay product than radix-4 Booth-based Posit multipliers. In 28-nm CMOS, the bounded variants occupy 0.013-0.016 mm2 , consume 19.8-22.1 mW, and operate at up to 1.84 GHz. Application-level evaluation across image-classification, ADAS, and edge-inference workloads shows that the evaluated Posit-16 and Posit-32 configurations remain within about 1.5 percentage points of FP32 accuracy. A TinyYOLOv3 prototype on Pynq-Z2 achieves 78 ms latency at 0.29 W and 22.6 mJ/frame, demonstrating the suitability of EULERADAS for low-power real-time ADAS inference.
翻译:高级驾驶辅助系统(ADAS)需要在严格的功耗与面积约束下实现低延迟推理的神经计算引擎。Posit算术因在低精度下提供高数值保真度而对此类加速器具有吸引力,但其可变长度阶码编码增加了编解码开销,并使数据通路面临较大的阶码域错误影响。本文提出EULER-ADAS——一种支持SIMD的对数有界Posit神经计算引擎,用于实现高能效与高可靠性的ADAS加速。所提数据通路结合了有界阶码Posit表示、带位截断的阶段自适应对数尾数乘法,以及支持Posit-(8,0)、Posit-(16,1)和Posit-(32,2)执行的SIMD共享quire累加路径。该统一架构无需复制精度专用硬件即可实现4倍Posit-8、2倍Posit-16或1倍Posit-32运算。FPGA实现表明,相较于精确Posit神经计算引擎,所提配置将LUT数量最多减少41.4%,延迟最多降低76.1%,功耗最多降低71.9%,同时相比基数为4的Booth型Posit乘法器实现高达10倍的能耗延迟积降低。在28纳米CMOS工艺下,有界变体电路占用面积0.013-0.016 mm²,功耗19.8-22.1 mW,最高工作频率1.84 GHz。在图像分类、ADAS及边缘推理负载中的应用级评估显示,所评估的Posit-16和Posit-32配置与FP32精度误差控制在约1.5个百分点以内。基于Pynq-Z2的TinyYOLOv3原型以0.29 W功耗实现78 ms延迟,每帧能耗22.6 mJ,证明了EULER-ADAS在低功耗实时ADAS推理中的适用性。