Efficient Activation Function Optimization through Surrogate Modeling

Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.

翻译：精心设计的激活函数可以提升神经网络在多种机器学习任务中的性能。然而，人类难以构建最优激活函数，且现有的激活函数搜索算法成本过高。本文通过三个步骤来改进现有技术：首先，使用2,913个系统生成的激活函数从头训练卷积网络、残差网络和视觉Transformer架构，创建了基准数据集Act-Bench-CNN、Act-Bench-ResNet和Act-Bench-ViT。其次，对基准空间进行了特性刻画，从而发展出新的基于代理模型的优化方法。具体而言，模型在初始化时的预测分布所对应的Fisher信息矩阵谱，以及激活函数的输出分布，被发现与性能具有高度预测性。第三，利用代理模型在CIFAR-100和ImageNet任务中发现改进的激活函数。上述每一步本身均为独立贡献；三者共同为激活函数优化的后续研究提供了实践与理论基础。代码可在https://github.com/cognizant-ai-labs/aquasurf获取，基准数据集可在https://github.com/cognizant-ai-labs/act-bench获取。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日