Swim: A General-Purpose, High-Performing, and Efficient Activation Function for Locomotion Control Tasks

Activation functions play a significant role in the performance of deep learning algorithms. In particular, the Swish activation function tends to outperform ReLU on deeper models, including deep reinforcement learning models, across challenging tasks. Despite this progress, ReLU is the preferred function partly because it is more efficient than Swish. Furthermore, in contrast to the fields of computer vision and natural language processing, the deep reinforcement learning and robotics domains have seen less inclination to adopt new activation functions, such as Swish, and instead continue to use more traditional functions, like ReLU. To tackle those issues, we propose Swim, a general-purpose, efficient, and high-performing alternative to Swish, and then provide an analysis of its properties as well as an explanation for its high-performance relative to Swish, in terms of both reward-achievement and efficiency. We focus on testing Swim on MuJoCo's locomotion continuous control tasks since they exhibit more complex dynamics and would therefore benefit most from a high-performing and efficient activation function. We also use the TD3 algorithm in conjunction with Swim and explain this choice in the context of the robot locomotion domain. We then conclude that Swim is a state-of-the-art activation function for continuous control locomotion tasks and recommend using it with TD3 as a working framework.

翻译：摘要：激活函数在深度学习算法的性能中起着重要作用。特别是在挑战性任务中，Swish激活函数在更深层的模型（包括深度强化学习模型）上往往优于ReLU。尽管取得了这一进展，但ReLU仍是首选函数，部分原因在于它比Swish更高效。此外，与计算机视觉和自然语言处理领域不同，深度强化学习和机器人领域较少倾向于采用Swish等新激活函数，而继续使用ReLU等传统函数。为解决这些问题，我们提出Swim——一种通用、高效且高性能的Swish替代方案，并分析其特性，同时从奖励获取和效率两方面解释其相对于Swish的高性能表现。我们重点在MuJoCo的运动连续控制任务上测试Swim，因为这些任务表现出更复杂的动力学特性，因此更能从高性能且高效的激活函数中获益。我们还将TD3算法与Swim结合使用，并在机器人运动领域的背景下解释这一选择。最终得出结论：Swim是面向连续控制运动任务的最先进激活函数，并推荐将其与TD3作为工作框架配合使用。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。