The Hive Mind is a Single Reinforcement Learning Agent

Decision-making is an essential attribute of any intelligent agent or group. Natural systems are known to converge to optimal strategies through at least two distinct mechanisms: collective decision-making via imitation of others, and individual trial-and-error. This paper establishes an equivalence between these two paradigms by drawing from the well-established collective decision-making model of nest-hunting in swarms of honey bees. We show that the emergent distributed cognition (sometimes referred to as the $\textit{hive mind}$) arising from individual bees following simple, local imitation-based rules is that of a single online reinforcement learning (RL) agent interacting with many parallel environments. The update rule through which this macro-agent learns is a bandit algorithm that we coin $\textit{Maynard-Cross Learning}$. Our analysis implies that a group of cognition-limited organisms can be equivalent to a more complex, reinforcement-enabled entity, substantiating the idea that group-level intelligence may explain how seemingly simple and blind individual behaviors are selected in nature. From a biological perspective, this analysis suggests how such imitation strategies evolved: they constitute a scalable form of reinforcement learning at the group level, aligning with theories of kin and group selection. Beyond biology, the framework offers new tools for analyzing economic and social systems where individuals imitate successful strategies, effectively participating in a collective learning process. In swarm intelligence, our findings will inform the design of scalable collective systems in artificial domains, enabling RL-inspired mechanisms for coordination and adaptability at scale.

翻译：决策是任何智能体或群体的基本属性。已知自然系统通过至少两种不同机制收敛至最优策略：通过模仿他人进行集体决策，以及个体试错。本文通过借鉴蜂群觅巢这一成熟的集体决策模型，建立了这两种范式之间的等价性。我们证明，由遵循简单局部模仿规则的个体蜜蜂所涌现的分布式认知（有时称为$\textit{群体智能}$），其本质等同于一个与多个并行环境交互的单一在线强化学习智能体。该宏观智能体的学习更新规则是一种我们命名为$\textit{梅纳德-克罗斯学习}$的多臂赌博机算法。我们的分析表明，一组认知能力有限的有机体可等价于一个更复杂、具备强化学习能力的实体，这证实了群体层面智能可以解释自然界中看似简单盲目的个体行为如何被选择。从生物学视角看，此分析揭示了此类模仿策略的演化机制：它们构成了群体层面可扩展的强化学习形式，与亲缘选择和群体选择理论相一致。在生物学之外，该框架为分析经济和社会系统提供了新工具，其中个体模仿成功策略，实质上参与了集体学习过程。在群体智能领域，我们的发现将为人工领域可扩展集体系统的设计提供参考，实现受强化学习启发的规模化协调与适应机制。