Emergence of Internal State-Modulated Swarming in Multi-Agent Patch Foraging System

Active particles are entities that sustain persistent out-of-equilibrium motion by consuming energy. Under certain conditions, they exhibit the tendency to self-organize through coordinated movements, such as swarming via aggregation. While performing non-cooperative foraging tasks, the emergence of such swarming behavior in foragers, exemplifying active particles, has been attributed to the partial observability of the environment, in which the presence of another forager can serve as a proxy signal to indicate the potential presence of a food source or a resource patch. In this paper, we validate this phenomenon by simulating multiple self-propelled foragers as they forage from multiple resource patches in a non-cooperative manner. These foragers operate in a continuous two-dimensional space with stochastic position updates and partial observability. We evolve a shared policy in the form of a continuous-time recurrent neural network that serves as a velocity controller for the foragers. To this end, we use an evolutionary strategy algorithm wherein the different samples of the policy-distribution are evaluated in the same rollout. Then we show that agents are able to learn to adaptively forage in the environment. Next, we show the emergence of swarming in the form of aggregation among the foragers when resource patches are absent. We observe that the strength of this swarming behavior appears to be inversely proportional to the amount of resource stored in the foragers, which supports the risk-sensitive foraging claims. Empirical analysis of the learned controller's hidden states in minimal test runs uncovers their sensitivity to the amount of resource stored in a forager. Clamping these hidden states to represent a lesser amount of resource hastens its learned aggregation behavior.

翻译：活性粒子是通过消耗能量维持持续非平衡运动的实体。在特定条件下，它们会通过协调运动表现出自组织倾向，例如通过聚集形成集群。执行非合作觅食任务时，作为活性粒子典型范例的觅食者中涌现的此类集群行为，已被归因于环境的局部可观测性——即其他觅食者的存在可作为代理信号，指示食物源或资源斑块的可能位置。本文通过模拟多个自推进觅食者以非合作方式从多个资源斑块觅食的过程，验证了这一现象。这些觅食者在连续二维空间中运行，具有随机位置更新和局部可观测性。我们以连续时间循环神经网络的形式演化出一个共享策略，作为觅食者的速度控制器。为此，我们采用进化策略算法，其中策略分布的不同样本在相同轨迹中进行评估。实验表明，智能体能够学习自适应觅食环境。随后我们证明，当资源斑块缺失时，觅食者之间会涌现出以聚集形式呈现的集群行为。观察到该集群行为的强度与觅食者体内存储的资源量呈反比关系，这支持了风险敏感觅食理论。对最小测试中学习控制器的隐藏状态进行实证分析发现，其与觅食者体内存储的资源量具有敏感性。将这些隐藏状态钳制为表征较少资源量时，会加速其习得的聚集行为。