WhatIf: Interactive Exploration of LLM-Powered Social Simulations for Policy Reasoning

Policymakers in domains such as emergency management, public health, and urban planning must make decisions under deep uncertainty, where outcomes depend on how large populations interpret information, coordinate, and adopt over time. Existing tools only partially support this process: tabletop exercises enable collaborative discussion but lack dynamic feedback, while computational simulations capture population dynamics but are designed for offline analysis. We present WhatIf, an interactive system that enables policymakers to steer, inspect, and compare LLM-powered social simulations in real time. Informed by a formative study in emergency preparedness planning, we derive four design requirements for interactive policy simulations: fluid steering, real-time scale, collaborative exploration, and multi-level interpretability. We developed WhatIf guided by these requirements and evaluated it with five preparedness professionals across three disaster evacuation scenarios. Our findings show that participants used the system as a space for iterative branching and comparison rather than evaluating fixed plans; reflected on tacit planning assumptions when agent behavior violated expectations; surfaced previously unrecognized planning vulnerabilities; and grounded their reasoning in inspectable agent-level cases rather than aggregate outputs alone. These findings suggest broader design implications for LLM-powered social simulation systems: designing such systems as interactive, shared reasoning environments -- rather than offline predictive tools -- can better support expert decision-making under deep uncertainty.

翻译：在应急管理、公共卫生和城市规划等领域，政策制定者必须在深度不确定性下做出决策——其最终结果取决于广大民众如何随时间推移解读信息、协调行动并采纳策略。现有工具仅能部分支撑这一过程：桌面推演能促进协作讨论，但缺乏动态反馈；而计算模拟虽能捕捉人群动态，却专为离线分析设计。我们提出WhatIf系统，一个支持政策制定者实时操控、审查与比较LLM驱动社会模拟的交互式平台。基于应急准备规划的形成性研究，我们提炼出交互式政策模拟的四项设计需求：流畅操控、实时规模、协作探索与多层级可解释性。我们依据这些需求开发了WhatIf系统，并在三个灾难疏散场景中与五位备灾专业人员进行了评估。研究结果表明：参与者将该系统视为迭代分支和比较的空间，而非评估固定方案的工具；当智能体行为违背预期时，他们反思了隐性的规划假设；发现了此前未被识别的规划漏洞；并基于可审查的智能体层级案例（而非仅依赖聚合输出）夯实推理依据。这些发现为LLM驱动社会模拟系统提供了更广泛的设计启示：将此类系统设计为交互式共享推理环境（而非离线预测工具），能更好地支撑专家在深度不确定性下的决策。