CBIL：基于真实视频的鱼类群体行为模仿学习 (CBIL: Collective Behavior Imitation Learning for Fish from Real Videos)

Reproducing realistic collective behaviors presents a captivating yet formidable challenge. Traditional rule-based methods rely on hand-crafted principles, limiting motion diversity and realism in generated collective behaviors. Recent imitation learning methods learn from data but often require ground truth motion trajectories and struggle with authenticity, especially in high-density groups with erratic movements. In this paper, we present a scalable approach, Collective Behavior Imitation Learning (CBIL), for learning fish schooling behavior directly from videos, without relying on captured motion trajectories. Our method first leverages Video Representation Learning, where a Masked Video AutoEncoder (MVAE) extracts implicit states from video inputs in a self-supervised manner. The MVAE effectively maps 2D observations to implicit states that are compact and expressive for following the imitation learning stage. Then, we propose a novel adversarial imitation learning method to effectively capture complex movements of the schools of fish, allowing for efficient imitation of the distribution for motion patterns measured in the latent space. It also incorporates bio-inspired rewards alongside priors to regularize and stabilize training. Once trained, CBIL can be used for various animation tasks with the learned collective motion priors. We further show its effectiveness across different species. Finally, we demonstrate the application of our system in detecting abnormal fish behavior from in-the-wild videos.

翻译：再现逼真的群体行为是一个引人入胜却又极具挑战性的课题。传统的基于规则的方法依赖于手工制定的原则，限制了所生成群体行为的运动多样性和真实感。近期的模仿学习方法从数据中学习，但通常需要真实的运动轨迹作为监督，并且难以保证生成行为的真实性，尤其是在运动轨迹多变的高密度群体中。本文提出一种可扩展的方法——群体行为模仿学习（CBIL），用于直接从视频中学习鱼群行为，而无需依赖捕获的运动轨迹。我们的方法首先利用视频表征学习，其中掩码视频自编码器（MVAE）以自监督的方式从视频输入中提取隐式状态。MVAE有效地将二维观测映射到紧凑且富有表现力的隐式状态，为后续的模仿学习阶段提供支持。随后，我们提出一种新颖的对抗性模仿学习方法，以有效捕捉鱼群复杂的运动模式，从而在隐空间中高效模仿所观测运动模式的分布。该方法还结合了受生物启发的奖励函数以及先验知识，以正则化并稳定训练过程。一旦训练完成，CBIL可利用学习到的群体运动先验知识，应用于多种动画生成任务。我们进一步展示了该方法在不同鱼种上的有效性。最后，我们演示了该系统在野外视频中检测鱼类异常行为的应用。