Fourier Analysis Network (FAN) was recently proposed as a simple way to improve neural network performance by replacing part of Rectified Linear Unit (ReLU) activations with sine and cosine functions. Although several studies have reported small but consistent gains across tasks, the underlying mechanism behind these improvements has remained unclear. In this work, we show that only the sine activation contributes positively to performance, whereas the cosine activation tends to be detrimental. Our analysis reveals that the improvement is not a consequence of the sine function's periodic nature; instead, it stems from the function's local behavior near x = 0, where its non-zero derivative mitigates the vanishing-gradient problem. We further show that FAN primarily alleviates the dying-ReLU problem, in which a neuron consistently receives negative inputs, produces zero gradients, and stops learning. Although modern ReLU-like activations, such as Leaky ReLU, GELU, and Swish, reduce ReLU's zero-gradient region, they still contain input domains where gradients remain significantly diminished, contributing to slower optimization and hindering rapid convergence. FAN addresses this limitation by introducing a more stable gradient pathway. This analysis shifts the understanding of FAN's benefits from a spectral interpretation to a concrete analysis of training dynamics, leading to the development of the Dual-Activation Layer (DAL), a more efficient convergence accelerator. We evaluate DAL on three tasks: classification of noisy sinusoidal signals versus pure noise, MNIST digit classification, and Electrocardiogram (ECG)-based biometric recognition. In all cases, DAL models converge faster and achieve equal or higher validation accuracy compared to models with conventional activations.
翻译:傅里叶分析网络(FAN)是近期提出的一种通过将部分整流线性单元(ReLU)激活函数替换为正弦与余弦函数来提升神经网络性能的简易方法。尽管多项研究报道了该方法在不同任务上带来虽小但一致的性能提升,其背后的改进机制一直不甚明确。本研究表明,仅正弦激活对性能有积极贡献,而余弦激活往往产生不利影响。我们的分析揭示,性能提升并非源于正弦函数的周期性特征,而是源于该函数在 x = 0 附近的局部特性——其非零导数缓解了梯度消失问题。我们进一步证明,FAN 主要缓解了“死亡 ReLU”问题,即神经元持续接收负输入、产生零梯度并停止学习。虽然现代类 ReLU 激活函数(如 Leaky ReLU、GELU 和 Swish)减少了 ReLU 的零梯度区域,但它们仍包含梯度显著衰减的输入域,导致优化速度减缓并阻碍快速收敛。FAN 通过引入更稳定的梯度路径解决了这一局限。该分析将 FAN 优势的理解从频谱解释转向对训练动态的具体分析,并由此开发出更高效的收敛加速器——双激活层(DAL)。我们在三个任务上评估 DAL:含噪正弦信号与纯噪声的分类、MNIST 手写数字分类以及基于心电图(ECG)的生物特征识别。在所有案例中,相较于采用传统激活函数的模型,DAL 模型收敛更快且达到同等或更高的验证准确率。