Activation functions enable neural networks to learn complex representations by introducing non-linearities. While feedforward models commonly use rectified linear units, sequential models like recurrent neural networks, long short-term memory (LSTMs) and gated recurrent units (GRUs) still rely on Sigmoid and TanH activation functions. However, these classical activation functions often struggle to model sparse patterns when trained on small sequential datasets to effectively capture temporal dependencies. To address this limitation, we propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints. SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time, facilitating improved gradient flow and information filtering. We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks, where the dataset is limited. Our experiments demonstrate that SST models consistently outperform RNN-based models with baseline activations, exhibiting improved test accuracy.
翻译:激活函数通过引入非线性使神经网络能够学习复杂表征。前馈模型通常使用修正线性单元,而循环神经网络、长短期记忆网络(LSTM)和门控循环单元(GRU)等序列模型仍依赖Sigmoid和TanH激活函数。然而,当使用小规模序列数据集进行训练以有效捕捉时间依赖性时,这些经典激活函数往往难以建模稀疏模式。为解决这一局限,我们提出专门针对数据约束下增强序列模型学习能力的平方Sigmoid TanH(SST)激活函数。SST通过数学平方运算,在信号随时间传播时放大强激活与弱激活之间的差异,从而促进梯度流改善与信息过滤。我们在数据有限的多种应用场景(如手语识别、回归分析和时间序列分类任务)中评估了基于SST的LSTM与GRU模型。实验表明,SST模型在性能上始终优于采用基线激活函数的RNN模型,展现出更高的测试准确率。