Mining Generalizable Activation Functions

The choice of activation function is an active area of research, with different proposals aimed at improving optimization, while maintaining expressivity. Additionally, the activation function can significantly alter the implicit inductive bias of the architecture, controlling its non-linear behavior. In this paper, in line with previous work, we argue that evolutionary search provides a useful framework for finding new activation functions, while we also make two novel observations. The first is that modern pipelines, such as AlphaEvolve, which relies on frontier LLMs as a mutator operator, allows for a much wider and flexible search space; e.g., over all possible python functions within a certain FLOP budget, eliminating the need for manually constructed search spaces. In addition, these pipelines will be biased towards meaningful activation functions, given their ability to represent common knowledge, leading to a potentially more efficient search of the space. The second observation is that, through this framework, one can target not only performance improvements but also activation functions that encode particular inductive biases. This can be done by using performance on out-of-distribution data as a fitness function, reflecting the degree to which the architecture respects the inherent structure in the data in a manner independent of distribution shifts. We carry an empirical exploration of this proposal and show that relatively small scale synthetic datasets can be sufficient for AlphaEvolve to discover meaningful activations.

翻译：激活函数的选择是一个活跃的研究领域，不同的提案旨在保持表达力的同时改进优化性能。此外，激活函数能显著改变架构的隐式归纳偏置，从而控制其非线性行为。本文延续先前工作的思路，认为进化搜索为发现新激活函数提供了有效框架，同时我们提出了两个新颖的观察。首先，现代流程（如依赖前沿大语言模型作为变异算子的AlphaEvolve）允许在更广泛灵活的搜索空间中进行探索——例如在特定浮点运算预算内搜索所有可能的Python函数，无需人工构建搜索空间。此外，鉴于这些流程能够表征常识知识，它们会偏向有意义的激活函数，从而可能实现更高效的空间搜索。第二个观察是，通过此框架不仅可以针对性能提升，还能设计编码特定归纳偏置的激活函数。这可通过将分布外数据上的性能作为适应度函数来实现，该函数反映了架构在独立于分布偏移的情况下尊重数据内在结构的程度。我们对此提案进行了实证探索，结果表明相对小规模的合成数据集已足以让AlphaEvolve发现具有意义的激活函数。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

【博士论文】强化学习智能体的奖励函数设计

专知会员服务

48+阅读 · 2025年4月8日

【博士论文】利用通用问题结构提高强化学习效率，177页pdf

专知会员服务

30+阅读 · 2024年7月17日

激活的三十年:神经网络400个激活函数的全面综述

专知会员服务

71+阅读 · 2024年2月18日

用Transformer学习通用超参数优化器，DeepMind Yutian Chen博士讲授，附Slides与视频

专知会员服务

40+阅读 · 2023年3月12日