Artificial neural networks used for reinforcement learning are structurally rigid, meaning that each optimized parameter of the network is tied to its specific placement in the network structure. It also means that a network only works with pre-defined and fixed input- and output sizes. This is a consequence of having the number of optimized parameters being directly dependent on the structure of the network. Structural rigidity limits the ability to optimize parameters of policies across multiple environments that do not share input and output spaces. Here, we evolve a set of neurons and plastic synapses each represented by a gated recurrent unit (GRU). During optimization, the parameters of these fundamental units of a neural network are optimized in different random structural configurations. Earlier work has shown that parameter sharing between units is important for making structurally flexible neurons We show that it is possible to optimize a set of distinct neuron- and synapse types allowing for a mitigation of the symmetry dilemma. We demonstrate this by optimizing a single set of neurons and synapses to solve multiple reinforcement learning control tasks simultaneously.
翻译:用于强化学习的人工神经网络具有结构刚性,即网络的每个优化参数都与其在网络结构中的特定位置绑定。这也意味着网络仅能适用于预定义且固定的输入和输出维度。这是由于优化参数的数量直接取决于网络结构所致。结构刚性限制了在具有不同输入输出空间的多个环境中优化策略参数的能力。本文进化了一组由门控循环单元(GRU)表示的神经元和可塑性突触。在优化过程中,神经网络这些基本单元的参数会在不同的随机结构配置中得到优化。已有研究表明,单元间的参数共享对实现结构灵活神经元至关重要。我们证明,通过优化一组不同的神经元和突触类型,可以缓解对称性困境。我们通过优化一组神经元和突触来同时解决多个强化学习控制任务,验证了该方法的有效性。