Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors

Traditional approaches to studying decision-making in neuroscience focus on simplified behavioral tasks where animals perform repetitive, stereotyped actions to receive explicit rewards. While informative, these methods constrain our understanding of decision-making to short timescale behaviors driven by explicit goals. In natural environments, animals exhibit more complex, long-term behaviors driven by intrinsic motivations that are often unobservable. Recent works in time-varying inverse reinforcement learning (IRL) aim to capture shifting motivations in long-term, freely moving behaviors. However, a crucial challenge remains: animals make decisions based on their history, not just their current state. To address this, we introduce SWIRL (SWitching IRL), a novel framework that extends traditional IRL by incorporating time-varying, history-dependent reward functions. SWIRL models long behavioral sequences as transitions between short-term decision-making processes, each governed by a unique reward function. SWIRL incorporates biologically plausible history dependency to capture how past decisions and environmental contexts shape behavior, offering a more accurate description of animal decision-making. We apply SWIRL to simulated and real-world animal behavior datasets and show that it outperforms models lacking history dependency, both quantitatively and qualitatively. This work presents the first IRL model to incorporate history-dependent policies and rewards to advance our understanding of complex, naturalistic decision-making in animals.

翻译：传统神经科学中研究决策的方法主要关注简化的行为任务，其中动物通过执行重复的、刻板化的动作来获取显性奖励。尽管这些方法具有参考价值，但它们将我们对决策的理解局限于由显性目标驱动的短时间尺度行为。在自然环境中，动物表现出更为复杂、长期的行为，这些行为通常由不可观测的内在动机驱动。近年来，时变逆强化学习（IRL）的研究致力于捕捉长期自由移动行为中不断变化的动机。然而，一个关键挑战依然存在：动物的决策不仅基于当前状态，还依赖于其历史经验。为解决这一问题，我们提出了SWIRL（切换式IRL），这是一个新颖的框架，通过引入时变的、历史依赖的奖励函数来扩展传统IRL。SWIRL将长期行为序列建模为短期决策过程之间的转换，每个过程由独特的奖励函数控制。SWIRL结合了生物学上合理的历史依赖性，以捕捉过去的决策和环境背景如何塑造行为，从而更准确地描述动物的决策机制。我们将SWIRL应用于模拟和真实世界的动物行为数据集，结果表明，无论在定量还是定性评估中，其性能均优于缺乏历史依赖性的模型。本研究首次提出了同时纳入历史依赖策略与奖励的IRL模型，为深入理解动物复杂、自然化的决策过程提供了新的理论工具。