We introduce a sequential reinforcement learning framework for imitation learning designed to model heterogeneous cognitive strategies in pollinators. Focusing on honeybees, our approach leverages trajectory similarity to capture and forecast behavior across individuals that rely on distinct strategies: some exploiting numerical cues, others drawing on memory, or being influenced by environmental factors such as weather. Through empirical evaluation, we show that state-of-the-art imitation learning methods often fail in this setting: when expert policies shift across memory windows or deviate from optimality, these models overlook both fast and slow learning behaviors and cannot faithfully reproduce key decision patterns. Moreover, they offer limited interpretability, hindering biological insight. Our contribution addresses these challenges by (i) introducing a model that minimizes predictive loss while identifying the effective memory horizon most consistent with behavioral data, and (ii) ensuring full interpretability to enable biologists to analyze underlying decision-making strategies and finally (iii) providing a mathematical framework linking bee policy search with bandit formulations under varying exploration-exploitation dynamics, and releasing a novel dataset of 80 tracked bees observed under diverse weather conditions. This benchmark facilitates research on pollinator cognition and supports ecological governance by improving simulations of insect behavior in agroecosystems. Our findings shed new light on the learning strategies and memory interplay shaping pollinator decision-making.
翻译:我们提出了一种用于模仿学习的序列强化学习框架,旨在模拟传粉者中异质的认知策略。以蜜蜂为研究对象,我们的方法利用轨迹相似性来捕捉和预测依赖不同策略的个体行为:有些个体利用数量线索,有些依赖记忆,另一些则受天气等环境因素影响。通过实证评估,我们发现当前最先进的模仿学习方法在此场景下常常失效:当专家策略在记忆窗口间切换或偏离最优性时,这些模型既忽略了快速学习行为也忽略了慢速学习行为,且无法忠实再现关键决策模式。此外,这些模型可解释性有限,阻碍了生物学洞见的获取。我们的贡献通过以下方式应对这些挑战:(i) 提出一种模型,在最小化预测损失的同时,识别与行为数据最一致的有效记忆范围;(ii) 确保完全的可解释性,使生物学家能够分析潜在的决策策略;(iii) 提供一个数学框架,将蜜蜂的策略搜索与变化探索-利用动态下的赌博机公式联系起来,并发布一个包含80只在多样化天气条件下观测的追踪蜜蜂的新型数据集。该基准促进了传粉者认知研究,并通过改进农业生态系统中昆虫行为的模拟来支持生态治理。我们的发现为塑造传粉者决策的学习策略与记忆交互作用提供了新的见解。