Understanding an agent's goals from its behavior is fundamental to aligning AI systems with human intentions. Existing goal recognition methods typically rely on an optimal goal-oriented policy representation, which may differ from the actor's true behavior and hinder the accurate recognition of their goal. To address this gap, this paper introduces Goal Recognition Alignment through Imitation Learning (GRAIL), which leverages imitation learning and inverse reinforcement learning to learn one goal-directed policy for each candidate goal directly from (potentially suboptimal) demonstration trajectories. By scoring an observed partial trajectory with each learned goal-directed policy in a single forward pass, GRAIL retains the one-shot inference capability of classical goal recognition while leveraging learned policies that can capture suboptimal and systematically biased behavior. Across the evaluated domains, GRAIL increases the F1-score by more than 0.5 under systematically biased optimal behavior, achieves gains of approximately 0.1-0.3 under suboptimal behavior, and yields improvements of up to 0.4 under noisy optimal trajectories, while remaining competitive in fully optimal settings. This work contributes toward scalable and robust models for interpreting agent goals in uncertain environments.
翻译:从智能体行为中理解其目标是使AI系统与人类意图对齐的基础。现有目标识别方法通常依赖于最优的目标导向策略表示,这可能与行为者的真实行为存在差异,从而阻碍对其目标的准确识别。为弥补这一差距,本文提出通过模仿学习实现目标识别对齐的方法(GRAIL),该方法利用模仿学习和逆强化学习,直接从(可能次优的)演示轨迹中为每个候选目标学习一个目标导向策略。通过使用每个学习到的目标导向策略对观测到的部分轨迹进行单次前向传播评分,GRAIL在保持经典目标识别单次推理能力的同时,能够利用学习到的策略捕捉次优及系统性偏差行为。在所有评估领域中,GRAIL在系统性偏差最优行为下将F1分数提升超过0.5,在次优行为下获得约0.1-0.3的增益,在含噪声最优轨迹下实现高达0.4的改进,同时在完全最优场景中保持竞争力。这项工作为在不确定环境中解释智能体目标的可扩展鲁棒模型提供了重要贡献。