Efficient Activation Function Optimization through Surrogate Modeling

Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in several real-world tasks, with a surprising finding: a sigmoidal design that outperformed all other activation functions was discovered, challenging the status quo of always using rectifier nonlinearities in deep learning. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization.

翻译：精心设计的激活函数能提升神经网络在诸多机器学习任务中的性能。然而，人类难以手动构造最优激活函数，且现有激活函数搜索算法成本过高。本文通过三个步骤改进现有技术水平：首先，基于2913个系统生成的激活函数从头训练卷积、残差及视觉Transformer架构，构建了Act-Bench-CNN、Act-Bench-ResNet和Act-Bench-ViT基准数据集。其次，通过分析基准空间特性，提出了一种基于代理模型的新型优化方法。具体而言，我们发现在初始化阶段，与模型预测分布相关的Fisher信息矩阵谱以及激活函数输出分布，对性能具有高度预测性。最后，将该代理模型应用于多个实际任务以发现更优激活函数，并得出一个令人意外的结论：一种S形设计超越了所有其他激活函数，挑战了深度学习中始终使用整流型非线性函数的传统认知。上述每个步骤本身均具有独立贡献，三者共同为激活函数优化的后续研究奠定了理论与实践基础。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日