Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.
翻译:三元神经网络(TNNs)相较于二元神经网络在精度与能耗的权衡上具有显著优势。然而,迄今为止,为实现其能效潜力,三元神经网络仍需依赖专用加速器,这阻碍了其广泛应用。为解决这一问题,我们提出了xTern——一种针对通用处理器核心加速TNN推理的轻量级RISC-V指令集架构(ISA)扩展。为配合该ISA扩展,我们开发了一套利用xTern的优化内核,其吞吐量较等效的2位网络实现提升了67%。功耗仅边际增加5.2%,从而实现了57.1%的能效提升。实验表明,所提出的xTern扩展集成至八核计算集群后,仅带来0.9%的极小硅面积开销,且对时序无任何影响。在端到端基准测试中,我们证明xTern能够部署三元神经网络,在同等推理延迟下,其CIFAR-10分类精度最高可比2位网络提升1.6个百分点。我们的研究结果表明,xTern使得基于RISC-V的超低功耗边缘AI平台能够充分利用三元神经网络的能效潜力。