The rapid progress of foundation models has led to the prosperity of autonomous agents, which leverage the universal capabilities of foundation models to conduct reasoning, decision-making, and environmental interaction. However, the efficacy of agents remains limited when operating in intricate, realistic environments. In this work, we introduce the principles of $\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents ($\mathbf{UA}^2$), which advocate for the simultaneous alignment of agents with human intentions, environmental dynamics, and self-constraints such as the limitation of monetary budgets. From the perspective of $\mathbf{UA}^2$, we review the current agent research and highlight the neglected factors in existing agent benchmarks and method candidates. We also conduct proof-of-concept studies by introducing realistic features to WebShop, including user profiles to demonstrate intentions, personalized reranking for complex environmental dynamics, and runtime cost statistics to reflect self-constraints. We then follow the principles of $\mathbf{UA}^2$ to propose an initial design of our agent, and benchmark its performance with several candidate baselines in the retrofitted WebShop. The extensive experimental results further prove the importance of the principles of $\mathbf{UA}^2$. Our research sheds light on the next steps of autonomous agent research with improved general problem-solving abilities.
翻译:基础模型的快速发展推动了自主智能体的繁荣,这些智能体利用基础模型的通用能力进行推理、决策和环境交互。然而,在复杂、逼真的环境中运行时,智能体的效能仍然受限。本文提出了$\mathbf{U}$nified $\mathbf{A}$lignment for $\mathbf{A}$gents($\mathbf{UA}^2$)原则,倡导同时对齐智能体与人类意图、环境动态以及自我约束(如货币预算限制)。基于$\mathbf{UA}^2$视角,我们回顾了当前智能体研究,并指出了现有智能体基准测试和候选方法中被忽视的因素。我们通过向WebShop引入逼真特征开展概念验证研究,包括用于体现意图的用户画像、面向复杂环境动态的个性化重排序、以及反映自我约束的运行时成本统计。随后遵循$\mathbf{UA}^2$原则提出智能体的初始设计方案,并在改造后的WebShop中将其与若干候选基线进行性能基准测试。大量实验结果进一步证明了$\mathbf{UA}^2$原则的重要性。本研究为提升自主智能体通用问题解决能力的后续发展方向提供了启示。