Reinforcement learning (RL) is a flexible and efficient method for programming micro-robots in complex environments. Here we investigate whether reinforcement learning can provide insights into biological systems when trained to perform chemotaxis. Namely, whether we can learn about how intelligent agents process given information in order to swim towards a target. We run simulations covering a range of agent shapes, sizes, and swim speeds to determine if the physical constraints on biological swimmers, namely Brownian motion, lead to regions where reinforcement learners' training fails. We find that the RL agents can perform chemotaxis as soon as it is physically possible and, in some cases, even before the active swimming overpowers the stochastic environment. We study the efficiency of the emergent policy and identify convergence in agent size and swim speeds. Finally, we study the strategy adopted by the reinforcement learning algorithm to explain how the agents perform their tasks. To this end, we identify three emerging dominant strategies and several rare approaches taken. These strategies, whilst producing almost identical trajectories in simulation, are distinct and give insight into the possible mechanisms behind which biological agents explore their environment and respond to changing conditions.
翻译:强化学习是复杂环境中编程微型机器人的一种灵活高效的方法。本文研究强化学习在训练执行趋化行为时,能否为生物系统提供见解,即探究智能体如何处理信息以游向目标。我们通过模拟覆盖不同形状、尺寸和游速的智能体,确定生物游泳者的物理约束(即布朗运动)是否导致强化学习训练在某些区域失败。研究发现,只要物理条件允许,强化学习智能体就能执行趋化行为,甚至在某些情况下,在主动游泳尚未完全克服随机环境时即可实现。我们研究了涌现策略的效率,并确定了智能体尺寸和游速的收敛性。最后,我们分析了强化学习算法所采用的策略,以解释智能体如何完成任务。为此,我们识别出三种主要涌现策略及若干罕见方法。这些策略虽在模拟中产生几乎相同的轨迹,却各具独特性,为理解生物智能体探索环境及响应条件变化背后的潜在机制提供了洞察。