Goal-directed interactive agents, which autonomously complete tasks through interactions with their environment, can assist humans in various domains of their daily lives. Recent advances in large language models (LLMs) led to a surge of new, more and more challenging tasks to evaluate such agents. To properly contextualize performance across these tasks, it is imperative to understand the different challenges they pose to agents. To this end, this survey compiles relevant tasks and environments for evaluating goal-directed interactive agents, structuring them along dimensions relevant for understanding current obstacles. An up-to-date compilation of relevant resources can be found on our project website: https://coli-saar.github.io/interactive-agents.
翻译:目标导向交互式智能体通过与环境自主交互完成任务,可在人类日常生活的多个领域提供协助。大型语言模型(LLMs)的最新进展催生了大量用于评估此类智能体的新型且日益复杂的任务。为准确评估智能体在不同任务中的性能表现,必须理解这些任务对智能体提出的差异化挑战。为此,本综述系统梳理了用于评估目标导向交互式智能体的相关任务与环境,并依据理解当前技术瓶颈的关键维度对其进行结构化分类。相关资源的最新汇编可访问我们的项目网站:https://coli-saar.github.io/interactive-agents。