Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions

Evaluating the surroundings to gain understanding, frame perspectives, and anticipate behavioral reactions is an inherent human trait. However, these continuous encounters are diverse and complex, posing challenges to their study and experimentation. Researchers have been able to isolate environmental features and study their effect on human perception and behavior. However, the research attempts to replicate and study human behaviors with proxies, such as by integrating virtual mediums and interviews, have been inconsistent. Large language models (LLMs) have recently been unveiled as capable of contextual understanding and semantic reasoning. These models have been trained on large amounts of text and have evolved to mimic believable human behavior. This study explores the current advancements in Generative agents powered by LLMs with the help of perceptual experiments. The experiment employs Generative agents to interact with the urban environments using street view images to plan their journey toward specific goals. The agents are given virtual personalities, which make them distinguishable. They are also provided a memory database to store their thoughts and essential visual information and retrieve it when needed to plan their movement. Since LLMs do not possess embodiment, nor have access to the visual realm, and lack a sense of motion or direction, we designed movement and visual modules that help agents gain an overall understanding of surroundings. The agents are further employed to rate the surroundings they encounter based on their perceived sense of safety and liveliness. As these agents store details in their memory, we query the findings to get details regarding their thought processes. Overall, this study experiments with current AI developments and their potential in simulated human behavior in urban environments.

翻译：评估周围环境以获取理解、构建视角及预判行为反应，是人类与生俱来的特质。然而，这些持续性的交互体验具有多样性与复杂性，为其研究与实验带来了挑战。研究者虽已能够分离环境特征并探究其对人类感知与行为的影响，但通过虚拟媒介与访谈等替代手段对人类行为进行复制与研究的尝试仍存在不一致性。近期，大型语言模型（LLM）展现出语境理解与语义推理的能力。这些模型基于海量文本训练，已进化至能够模拟可信的人类行为。本研究借助感知实验，探索当前由LLM驱动的生成式代理技术前沿。实验采用生成式代理通过街景图像与城市环境交互，以规划朝向特定目标的行动路径。代理被赋予虚拟人格以体现差异性，并配备记忆数据库用于存储其思考过程与关键视觉信息，在需要时调用以规划移动。鉴于LLM缺乏具身性、视觉域访问能力及运动/方向感知，我们设计了运动模块与视觉模块，帮助代理获得对环境的整体理解。代理进一步根据其感知的安全性与活跃度对遭遇的环境进行评分。由于代理将细节存储于记忆之中，我们通过查询结果获取其思维过程的细节。总体而言，本研究揭示了当前人工智能发展在模拟城市环境中人类行为方面的潜力。