The choice of activation function is an active area of research, with different proposals aimed at improving optimization, while maintaining expressivity. Additionally, the activation function can significantly alter the implicit inductive bias of the architecture, controlling its non-linear behavior. In this paper, in line with previous work, we argue that evolutionary search provides a useful framework for finding new activation functions, while we also make two novel observations. The first is that modern pipelines, such as AlphaEvolve, which relies on frontier LLMs as a mutator operator, allows for a much wider and flexible search space; e.g., over all possible python functions within a certain FLOP budget, eliminating the need for manually constructed search spaces. In addition, these pipelines will be biased towards meaningful activation functions, given their ability to represent common knowledge, leading to a potentially more efficient search of the space. The second observation is that, through this framework, one can target not only performance improvements but also activation functions that encode particular inductive biases. This can be done by using performance on out-of-distribution data as a fitness function, reflecting the degree to which the architecture respects the inherent structure in the data in a manner independent of distribution shifts. We carry an empirical exploration of this proposal and show that relatively small scale synthetic datasets can be sufficient for AlphaEvolve to discover meaningful activations.
翻译:激活函数的选择是一个活跃的研究领域,不同的提案旨在保持表达力的同时改进优化性能。此外,激活函数能显著改变架构的隐式归纳偏置,从而控制其非线性行为。本文延续先前工作的思路,认为进化搜索为发现新激活函数提供了有效框架,同时我们提出了两个新颖的观察。首先,现代流程(如依赖前沿大语言模型作为变异算子的AlphaEvolve)允许在更广泛灵活的搜索空间中进行探索——例如在特定浮点运算预算内搜索所有可能的Python函数,无需人工构建搜索空间。此外,鉴于这些流程能够表征常识知识,它们会偏向有意义的激活函数,从而可能实现更高效的空间搜索。第二个观察是,通过此框架不仅可以针对性能提升,还能设计编码特定归纳偏置的激活函数。这可通过将分布外数据上的性能作为适应度函数来实现,该函数反映了架构在独立于分布偏移的情况下尊重数据内在结构的程度。我们对此提案进行了实证探索,结果表明相对小规模的合成数据集已足以让AlphaEvolve发现具有意义的激活函数。