Active particles are entities that sustain persistent out-of-equilibrium motion by consuming energy. Under certain conditions, they exhibit the tendency to self-organize through coordinated movements, such as swarming via aggregation. While performing non-cooperative foraging tasks, the emergence of such swarming behavior in foragers, exemplifying active particles, has been attributed to the partial observability of the environment, in which the presence of another forager can serve as a proxy signal to indicate the potential presence of a food source or a resource patch. In this paper, we validate this phenomenon by simulating multiple self-propelled foragers as they forage from multiple resource patches in a non-cooperative manner. These foragers operate in a continuous two-dimensional space with stochastic position updates and partial observability. We evolve a shared policy in the form of a continuous-time recurrent neural network that serves as a velocity controller for the foragers. To this end, we use an evolutionary strategy algorithm wherein the different samples of the policy-distribution are evaluated in the same rollout. Then we show that agents are able to learn to adaptively forage in the environment. Next, we show the emergence of swarming in the form of aggregation among the foragers when resource patches are absent. We observe that the strength of this swarming behavior appears to be inversely proportional to the amount of resource stored in the foragers, which supports the risk-sensitive foraging claims. Empirical analysis of the learned controller's hidden states in minimal test runs uncovers their sensitivity to the amount of resource stored in a forager. Clamping these hidden states to represent a lesser amount of resource hastens its learned aggregation behavior.
翻译:活性粒子是通过消耗能量维持持续非平衡运动的实体。在特定条件下,它们会通过协调运动表现出自组织倾向,例如通过聚集形成集群。在执行非合作觅食任务时,作为活性粒子范例的觅食者出现此类集群行为,可归因于环境的部分可观测性——其中其他觅食者的存在可作为指示潜在食物源或资源斑块存在的代理信号。本文通过模拟多个自驱动觅食者以非合作方式从多个资源斑块觅食,验证了这一现象。这些觅食者在具有随机位置更新和部分可观测性的连续二维空间中运行。我们以连续时间递归神经网络的形式演化共享策略,作为觅食者的速度控制器。为此,我们采用进化策略算法,在同一轮次中评估策略分布的不同样本。随后我们证明智能体能够学会在环境中自适应觅食。接着,我们展示了当资源斑块缺失时,觅食者以聚集形式表现出的集群行为涌现。我们观察到这种集群行为的强度似乎与觅食者储存的资源量成反比,这支持了风险敏感觅食假说。通过对习得控制器隐藏状态在最小化测试运行中的实证分析,我们发现其对觅食者储存资源量具有敏感性。将隐藏状态固定为表示较少资源量时,会加速其习得的聚集行为。