This paper introduces a significantly better class of activation functions than the almost universally used ReLU like and Sigmoidal class of activation functions. Two new activation functions referred to as the Cone and Parabolic-Cone that differ drastically from popular activation functions and significantly outperform these on the CIFAR-10 and Imagenette benchmmarks are proposed. The cone activation functions are positive only on a finite interval and are strictly negative except at the end-points of the interval, where they become zero. Thus the set of inputs that produce a positive output for a neuron with cone activation functions is a hyperstrip and not a half-space as is the usual case. Since a hyper strip is the region between two parallel hyper-planes, it allows neurons to more finely divide the input feature space into positive and negative classes than with infinitely wide half-spaces. In particular the XOR function can be learn by a single neuron with cone-like activation functions. Both the cone and parabolic-cone activation functions are shown to achieve higher accuracies with significantly fewer neurons on benchmarks. The results presented in this paper indicate that many nonlinear real-world datasets may be separated with fewer hyperstrips than half-spaces. The Cone and Parabolic-Cone activation functions have larger derivatives than ReLU and are shown to significantly speedup training.
翻译:本文提出了一类显著优于几乎普遍使用的ReLU类和Sigmoid类激活函数的激活函数。我们提出了两种与传统激活函数截然不同的新激活函数——锥形(Cone)和抛物锥形(Parabolic-Cone),它们在CIFAR-10和Imagenette基准测试中表现出显著更优的性能。锥形激活函数仅在有限区间内为正值,除区间端点处为零外,其余区间均为严格负值。因此,采用锥形激活函数的神经元产生正输出的输入集合是一个超平面带,而非通常情况下的半空间。由于超平面带是两平行超平面之间的区域,相较于无限宽的半空间,该函数使神经元能够更精细地将输入特征空间划分为正负类别。值得注意的是,采用锥形类激活函数的单个神经元即可学习XOR函数。实验证明,锥形和抛物锥形激活函数在基准测试中能以显著更少的神经元实现更高精度。本文结果表明,许多非线性真实世界数据集可能通过更少的超平面带(而非半空间)实现分类。锥形和抛物锥形激活函数具有比ReLU更大的导数,可显著加速训练过程。