Despite remarkable advancements in emulating human-like behavior through Large Language Models (LLMs), current textual simulations do not adequately address the notion of time. To this end, we introduce TimeArena, a novel textual simulated environment that incorporates complex temporal dynamics and constraints that better reflect real-life planning scenarios. In TimeArena, agents are asked to complete multiple tasks as soon as possible, allowing for parallel processing to save time. We implement the dependency between actions, the time duration for each action, and the occupancy of the agent and the objects in the environment. TimeArena grounds to 30 real-world tasks in cooking, household activities, and laboratory work. We conduct extensive experiments with various state-of-the-art LLMs using TimeArena. Our findings reveal that even the most powerful models, e.g., GPT-4, still lag behind humans in effective multitasking, underscoring the need for enhanced temporal awareness in the development of language agents.
翻译:尽管大型语言模型在模拟人类行为方面取得了显著进展,但当前的文本模拟未能充分解决时间概念的问题。为此,我们引入了TimeArena——一种新颖的文本模拟环境,它包含了能更好反映现实生活规划场景的复杂时间动态与约束。在TimeArena中,智能体需要尽可能快地完成多个任务,允许通过并行处理来节省时间。我们实现了动作间的依赖关系、每个动作的持续时间以及智能体与环境对象的占用状态。TimeArena涵盖烹饪、家务活动和实验室工作等30个现实世界任务。我们利用TimeArena对各种最先进的大型语言模型进行了广泛实验。研究结果表明,即使是最强大的模型(如GPT-4)在高效多任务处理方面仍落后于人类,这凸显了在语言智能体开发中增强时间感知能力的必要性。