Carefully designed activation functions can improve the performance of neural networks in many machine learning tasks. However, it is difficult for humans to construct optimal activation functions, and current activation function search algorithms are prohibitively expensive. This paper aims to improve the state of the art through three steps: First, the benchmark datasets Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures from scratch with 2,913 systematically generated activation functions. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization. More specifically, the spectrum of the Fisher information matrix associated with the model's predictive distribution at initialization and the activation function's output distribution were found to be highly predictive of performance. Third, the surrogate was used to discover improved activation functions in CIFAR-100 and ImageNet tasks. Each of these steps is a contribution in its own right; together they serve as a practical and theoretical foundation for further research on activation function optimization. Code is available at https://github.com/cognizant-ai-labs/aquasurf, and the benchmark datasets are at https://github.com/cognizant-ai-labs/act-bench.
翻译:精心设计的激活函数可以在许多机器学习任务中提升神经网络的性能。然而,人类难以构建最优激活函数,且现有的激活函数搜索算法成本过高。本文旨在通过三个步骤改进现有技术:首先,通过从头训练包含2,913个系统生成激活函数的卷积、残差和视觉Transformer架构,创建了基准数据集Act-Bench-CNN、Act-Bench-ResNet和Act-Bench-ViT。其次,开发了对基准空间的表征,从而提出了一种新的基于代理的优化方法。具体而言,模型在初始化时预测分布的Fisher信息矩阵谱及其激活函数输出分布被发现与性能高度相关。第三,利用该代理在CIFAR-100和ImageNet任务中发现了改进的激活函数。上述每一步本身即为一项贡献;它们共同为激活函数优化的进一步研究提供了实践与理论基础。代码见https://github.com/cognizant-ai-labs/aquasurf,基准数据集见https://github.com/cognizant-ai-labs/act-bench。