Efficient hardware implementation of nonlinear activation functions is a crucial task in deploying artificial neural networks on resource-constrained and edge devices such as Field-Programmable Gate Arrays (FPGAs). The sigmoid activation function is widely used for probabilistic output, binary classification, and gating mechanisms in recurrent neural networks, despite its reliance on exponential computations. This paper presents a hardware-efficient FPGA implementation of the sigmoid activation function using a mixed-radix CORDIC-based architecture. The proposed approach leverages the mathematical relationship between the sigmoid and hyperbolic tangent functions. The input range is normalized to 1, enabling the corresponding tanh computation to operate within a reduced range of 0.5, which significantly improves convergence behavior. To achieve high accuracy with minimal hardware overhead, a modified mixed-radix hyperbolic rotation CORDIC (MR-HRC) algorithm combining radix-2 and radix-4 iterations is introduced. The initial radix-2 stage ensures stable convergence, while the subsequent radix-4 stage accelerates convergence without requiring scale-factor compensation. In the final stage, a radix-2 linear vectoring CORDIC (R2-LVC) is used to compute the hyperbolic tangent by dividing hyperbolic sine and cosine values derived from the MR-HRC algorithm. The entire architecture is fully pipelined and implemented on an FPGA. The design is realized on an Xilinx Virtex-7 FPGA using a 16-bit fixed-point representation. Experimental results demonstrate a significant reduction in hardware utilization, requiring only 835 logic slices with zero DSP usage. Additionally, the design achieves a mean absolute error of 4.23 10^-4, outperforming several recent sigmoid implementations.
翻译:非线性激活函数的硬件高效实现是在资源受限与边缘设备(如现场可编程门阵列)上部署人工神经网络的关键任务。Sigmoid激活函数尽管依赖于指数运算,但仍被广泛用于概率输出、二分类及循环神经网络中的门控机制。本文提出一种基于混合基数CORDIC架构的硬件高效Sigmoid激活函数FPGA实现方法。所提方法利用Sigmoid函数与双曲正切函数之间的数学关系。通过将输入范围归一化至1,对应的双曲正切计算可在缩减至0.5的范围内运行,从而显著改善收敛性能。为实现高精度与最小硬件开销,引入一种改进的混合基数双曲旋转CORDIC算法,该算法结合了基2与基4迭代。初始基2阶段确保稳定收敛,后续基4阶段无需比例因子补偿即可加速收敛。最终阶段采用基2线性矢量CORDIC算法,通过计算由MR-HRC算法得出的双曲正弦与余弦值来求解双曲正切。整体架构采用全流水线设计并在FPGA上实现。该设计基于Xilinx Virtex-7 FPGA,采用16位定点数表示。实验结果表明,硬件资源消耗显著降低,仅需835个逻辑片且零DSP单元使用。此外,该设计达到了4.23×10^-4的平均绝对误差,优于近期多项Sigmoid实现方案。