Over the past two decades, researchers have made significant steps in simulating agent-based human crowds, yet most efforts remain focused on low-level tasks such as collision avoidance, path following, and flocking. As a result, these approaches often struggle to capture the high-level behaviors that emerge from sustained agent-agent and agent-environment interactions over time. We introduce Generative Crowds (Gen-C), a generative framework that produces crowd scenarios capturing agent-agent and agent-environment interactions, shaping coherent high-level crowd plans. To avoid the labor-intensive process of collecting and annotating real crowd video data, we leverage Large Language Models (LLMs) to bootstrap synthetic datasets of crowd scenarios. To represent those scenarios, we propose a time-expanded graph structure encoding actions, interactions, and spatial context. Gen-C employs a dual Variational Graph Autoencoder (VGAE) architecture that jointly learns connectivity patterns and node features conditioned on textual and structural signals, overcoming the limitations of direct LLM generation to enable scalable, environment-aware multi-agent crowd simulations. We demonstrate the effectiveness of our framework on scenarios with diverse behaviors such as a University Campus and a Train Station, showing that it generates heterogeneous crowds, coherent interactions, and high-level decision-making patterns consistent with the provided context.
翻译:过去二十年中,研究者们在基于智能体的人群模拟方面取得了显著进展,但大多数工作仍聚焦于碰撞规避、路径跟随和集群行为等底层任务。因此,这些方法往往难以捕捉由智能体之间及智能体与环境长期交互产生的高层行为。我们提出生成式人群(Gen-C),一种能够生成包含智能体间及智能体-环境交互场景的生成式框架,塑造连贯的高层人群规划。为避免采集和标注真实人群视频数据的繁重工作,我们利用大型语言模型(LLMs)引导生成合成人群场景数据集。为表示这些场景,我们提出一种时间扩展图结构,编码动作、交互与空间上下文。Gen-C采用双变分图自编码器(VGAE)架构,基于文本和结构信号联合学习连接模式与节点特征,克服了直接LLM生成的局限性,实现了可扩展的、环境感知的多智能体人群模拟。我们在大学校园和火车站等包含多样行为的场景中验证了框架的有效性,表明其能生成异质人群、连贯交互及与给定上下文一致的高层决策模式。