The growing demand for edge-AI systems requires arithmetic units that balance numerical precision, energy efficiency, and compact hardware while supporting diverse formats. Posit arithmetic offers advantages over floating- and fixed-point representations through its tapered precision, wide dynamic range, and improved numerical robustness. This work presents SPADE, a unified multi-precision SIMD Posit-based multiplyaccumulate (MAC) architecture supporting Posit (8,0), Posit (16,1), and Posit (32,2) within a single framework. Unlike prior single-precision or floating/fixed-point SIMD MACs, SPADE introduces a regime-aware, lane-fused SIMD Posit datapath that hierarchically reuses Posit-specific submodules (LOD, complementor, shifter, and multiplier) across 8/16/32-bit precisions without datapath replication. FPGA implementation on a Xilinx Virtex-7 shows 45.13% LUT and 80% slice reduction for Posit (8,0), and up to 28.44% and 17.47% improvement for Posit (16,1) and Posit (32,2) over prior work, with only 6.9% LUT and 14.9% register overhead for multi-precision support. ASIC results across TSMC nodes achieve 1.38 GHz at 6.1 mW (28 nm). Evaluation on MNIST, CIFAR-10/100, and alphabet datasets confirms competitive inference accuracy.
翻译:边缘AI系统日益增长的需求要求算术单元在支持多种格式的同时,平衡数值精度、能效和紧凑的硬件设计。Posit算术通过其渐缩精度、宽动态范围和增强的数值鲁棒性,相较于浮点与定点表示具有优势。本文提出SPADE,一种统一的多精度SIMD Posit乘累加架构,在单一框架内支持Posit(8,0)、Posit(16,1)和Posit(32,2)格式。与以往的单精度或浮点/定点SIMD MAC不同,SPADE引入了一种基于区间感知、通道融合的SIMD Posit数据通路,该通路在8/16/32位精度间分层复用Posit专用子模块,而无需复制数据通路。在Xilinx Virtex-7上的FPGA实现表明,对于Posit(8,0),其LUT和Slice数量分别减少了45.13%和80%;对于Posit(16,1)和Posit(32,2),相较于先前工作分别实现了高达28.44%和17.47%的性能提升,而支持多精度仅带来6.9%的LUT和14.9%的寄存器开销。基于TSMC工艺节点的ASIC结果在28nm下实现了1.38 GHz频率和6.1 mW功耗。在MNIST、CIFAR-10/100及字母数据集上的评估证实了其具有竞争力的推理精度。