In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.
翻译:过去,网络中对单一低维激活函数的研究导致了内部协变量偏移和梯度偏差问题。一个相对较小的研究领域是如何利用函数组合为单一激活函数应用提供性质补充。我们提出了一种网络对抗方法来应对上述挑战。这是首个在网络中使用不同激活函数的方法。基于当前网络中已有的激活函数,我们构造了一个导数图像性质相反的对抗函数,并将两者交替用作不同网络层的激活函数。针对复杂情况,我们提出了高维函数图分解(HD-FGD)方法,将其划分为不同部分后通过线性层处理。在整合各分解项偏导数的逆运算后,参照分解过程的计算规则得到其对抗函数。使用网络对抗方法或单独使用HD-FGD均可有效替代传统MLP+激活函数模式。通过上述方法,我们在训练效率和预测精度方面相比标准激活函数均实现了显著提升。本文探讨了多种常见激活函数的对抗性问题,提出了可无缝集成到现有模型且无负面影响的替代方案。我们将在会议审稿流程完成后开源代码。