A Significantly Better Class of Activation Functions Than ReLU Like Activation Functions

This paper introduces a significantly better class of activation functions than the almost universally used ReLU like and Sigmoidal class of activation functions. Two new activation functions referred to as the Cone and Parabolic-Cone that differ drastically from popular activation functions and significantly outperform these on the CIFAR-10 and Imagenette benchmmarks are proposed. The cone activation functions are positive only on a finite interval and are strictly negative except at the end-points of the interval, where they become zero. Thus the set of inputs that produce a positive output for a neuron with cone activation functions is a hyperstrip and not a half-space as is the usual case. Since a hyper strip is the region between two parallel hyper-planes, it allows neurons to more finely divide the input feature space into positive and negative classes than with infinitely wide half-spaces. In particular the XOR function can be learn by a single neuron with cone-like activation functions. Both the cone and parabolic-cone activation functions are shown to achieve higher accuracies with significantly fewer neurons on benchmarks. The results presented in this paper indicate that many nonlinear real-world datasets may be separated with fewer hyperstrips than half-spaces. The Cone and Parabolic-Cone activation functions have larger derivatives than ReLU and are shown to significantly speedup training.

翻译：本文提出了一类显著优于几乎普遍使用的ReLU类和Sigmoid类激活函数的激活函数。我们提出了两种与传统激活函数截然不同的新激活函数——锥形（Cone）和抛物锥形（Parabolic-Cone），它们在CIFAR-10和Imagenette基准测试中表现出显著更优的性能。锥形激活函数仅在有限区间内为正值，除区间端点处为零外，其余区间均为严格负值。因此，采用锥形激活函数的神经元产生正输出的输入集合是一个超平面带，而非通常情况下的半空间。由于超平面带是两平行超平面之间的区域，相较于无限宽的半空间，该函数使神经元能够更精细地将输入特征空间划分为正负类别。值得注意的是，采用锥形类激活函数的单个神经元即可学习XOR函数。实验证明，锥形和抛物锥形激活函数在基准测试中能以显著更少的神经元实现更高精度。本文结果表明，许多非线性真实世界数据集可能通过更少的超平面带（而非半空间）实现分类。锥形和抛物锥形激活函数具有比ReLU更大的导数，可显著加速训练过程。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日