In advancing the understanding of decision-making processes, Inverse Reinforcement Learning (IRL) have proven instrumental in reconstructing animal's multiple intentions amidst complex behaviors. Given the recent development of a continuous-time multi-intention IRL framework, there has been persistent inquiry into inferring discrete time-varying rewards with IRL. To tackle the challenge, we introduce Latent (Markov) Variable Inverse Q-learning (L(M)V-IQL), a novel class of IRL algorthms tailored for accommodating discrete intrinsic reward functions. Leveraging an Expectation-Maximization approach, we cluster observed expert trajectories into distinct intentions and independently solve the IRL problem for each. Demonstrating the efficacy of L(M)V-IQL through simulated experiments and its application to different real mouse behavior datasets, our approach surpasses current benchmarks in animal behavior prediction, producing interpretable reward functions. This advancement holds promise for neuroscience and cognitive science, contributing to a deeper understanding of decision-making and uncovering underlying brain mechanisms.
翻译:在推进对决策过程的理解中,逆强化学习(IRL)已被证明在重构动物复杂行为中的多重意图方面具有重要作用。鉴于近期连续时间多意图逆强化学习框架的发展,关于利用逆强化学习推断离散时变奖励的研究持续受到关注。为应对这一挑战,我们提出隐(马尔可夫)变量逆Q学习(L(M)V-IQL)——一类专为适应离散内在奖励函数而设计的新型逆强化学习算法。通过采用期望最大化方法,我们将观测到的专家轨迹聚类为不同意图,并独立求解每个意图的逆强化学习问题。通过模拟实验及对多种真实小鼠行为数据集的验证,L(M)V-IQL在动物行为预测中超越了当前基准方法,并生成了可解释的奖励函数。该进展对神经科学与认知科学具有重要意义,有助于深入理解决策过程并揭示潜在脑机制。