When training artificial intelligence (AI) to perform tasks, humans often care not only about whether a task is completed but also how it is performed. As AI agents tackle increasingly complex tasks, aligning their behavior with human-provided specifications becomes critical for responsible AI deployment. Reward design provides a direct channel for such alignment by translating human expectations into reward functions that guide reinforcement learning (RL). However, existing methods are often too limited to capture nuanced human preferences that arise in long-horizon tasks. Hence, we introduce Hierarchical Reward Design from Language (HRDL): a problem formulation that extends classical reward design to encode richer behavioral specifications for hierarchical RL agents. We further propose Language to Hierarchical Rewards (L2HR) as a solution to HRDL. Experiments show that AI agents trained with rewards designed via L2HR not only complete tasks effectively but also better adhere to human specifications. Together, HRDL and L2HR advance the research on human-aligned AI agents.
翻译:在训练人工智能(AI)执行任务时,人类不仅关注任务是否完成,还关注其执行方式。随着AI智能体处理日益复杂的任务,使其行为与人类提供的规范对齐,对于负责任的AI部署至关重要。奖励设计通过将人类期望转化为指导强化学习(RL)的奖励函数,为此类对齐提供了直接途径。然而,现有方法通常过于局限,难以捕捉在长视野任务中出现的细微人类偏好。为此,我们引入了基于语言的分层奖励设计(HRDL):一种将经典奖励设计扩展至为分层RL智能体编码更丰富行为规范的问题框架。我们进一步提出语言到分层奖励(L2HR)作为HRDL的解决方案。实验表明,通过L2HR设计的奖励进行训练的AI智能体不仅能有效完成任务,还能更好地遵循人类规范。HRDL与L2HR共同推动了人类对齐AI智能体的研究。