LLMs are increasingly deployed to simulate social interactions, yet many of the existing simulators remain ad hoc and monolithic. This lack of architectural standardization prevents reproducible research and complicates downstream evaluation. We advance a rigorous science of LLM-based multi-agent simulation by modularizing core components into Environments, Agents, Simulation engines, and Evaluation metrics (EASE). We demonstrate the utility of EASE configuration by wrapping it in an experimental study schema for orchestrating workflows centered around answering explicit research questions in generated scenarios. We contribute SiliSocS, an open-source, research-ready Silicon Society Sandbox implementing a study-structured EASE configuration to enable highly configurable and reproducible LLM-based social simulations. Using SiliSocS and EASE, we present three case studies, showcasing the system's comprehensive assessment of existing questions, ability to dive deeper into complex questions, and elaboration of existing studies, respectively. Together, these case studies highlight the limitations of current modeling approaches and isolate the impacts of design choices on key results.
翻译:大语言模型越来越多地被用于模拟社会交互,然而许多现有模拟器仍然具有临时性和整体性。这种架构标准化缺失阻碍了可重复研究,并使下游评估复杂化。我们通过将核心组件模块化为环境、智能体、模拟引擎和评估指标(EASE),推进了基于LLM的多智能体模拟的严谨科学。我们通过将EASE配置封装于实验研究方案中,以编排围绕在生成场景中回答明确研究问题的工作流,展示了其效用。我们贡献了SiliSocS——一个开源、研究就绪的硅社会沙盒,实现了基于研究的EASE配置,从而支持高度可配置且可重复的基于LLM的社会模拟。利用SiliSocS与EASE,我们提出了三个案例研究,分别展示了系统对现有问题的全面评估、深入探索复杂问题的能力,以及对现有研究的扩充。这些案例研究共同揭示了当前建模方法的局限性,并隔离了设计选择对关键结果的影响。