Robots that learn to evaluate models of collective behavior

Understanding and modeling animal behavior is essential for studying collective motion, decision-making, and bio-inspired robotics. Yet, evaluating the accuracy of behavioral models still often relies on offline comparisons to static trajectory statistics. Here we introduce a reinforcement-learning-based framework that uses a biomimetic robotic fish (RoboFish) to evaluate computational models of live fish behavior through closed-loop interaction. We trained policies in simulation using four distinct fish models-a simple constant-follow baseline, two rule-based models, and a biologically grounded convolutional neural network model-and transferred these policies to the real RoboFish setup, where they interacted with live fish. Policies were trained to guide a simulated fish to goal locations, enabling us to quantify how the response of real fish differs from the simulated fish's response. We evaluate the fish models by quantifying the sim-to-real gaps, defined as the Wasserstein distance between simulated and real distributions of behavioral metrics such as goal-reaching performance, inter-individual distances, wall interactions, and alignment. The neural network-based fish model exhibited the smallest gap across goal-reaching performance and most other metrics, indicating higher behavioral fidelity than conventional rule-based models under this benchmark. More importantly, this separation shows that the proposed evaluation can quantitatively distinguish candidate models under matched closed-loop conditions. Our work demonstrates how learning-based robotic experiments can uncover deficiencies in behavioral models and provides a general framework for evaluating animal behavior models through embodied interaction.

翻译：理解动物行为对于研究集体运动、决策制定以及仿生机器人学至关重要。然而，评估行为模型的准确性通常仍依赖于将离线结果与静态轨迹统计数据进行对比。在此，我们引入一个基于强化学习的框架，该框架利用仿生机器鱼（RoboFish）通过闭环互动来评估活体鱼类的计算行为模型。我们使用四种不同的鱼类模型（一个简单的恒定跟随基线模型、两个基于规则的模型以及一个基于生物学的卷积神经网络模型）在仿真环境中训练策略，并将这些策略迁移到真实的RoboFish装置上，使其与活体鱼互动。策略被训练来引导一条模拟鱼到达目标位置，从而使我们能够量化真实鱼与模拟鱼反应之间的差异。我们通过量化模拟到现实的差距（定义为行为指标（如目标达成表现、个体间距离、墙壁交互及对齐程度）的模拟分布与真实分布之间的Wasserstein距离）来评估这些鱼类模型。基于神经网络的鱼类模型在目标达成表现及大多数其他指标上展现出最小的差距，表明在该基准测试下，其行为保真度高于传统的基于规则的模型。更重要的是，这种差异表明，所提出的评估方法能够在匹配的闭环条件下定量区分候选模型。我们的工作展示了基于学习的机器人实验如何揭示行为模型的缺陷，并提供了一个通过具身互动评估动物行为模型的通用框架。