Agent-based simulation with a synthetic population can help us compare different treatment conditions while keeping everything else constant within the same population (i.e., as digital twins). Such population-scale simulations require large computational power (i.e., CPU resources) to get accurate estimates for treatment effects. We can use meta models of the simulation results to circumvent the need to simulate every treatment condition. Selecting the best estimating model at a given sample size (number of simulation runs) is a crucial problem. Depending on the sample size, the ability of the method to estimate accurately can change significantly. In this paper, we discuss different methods to explore what model works best at a specific sample size. In addition to the empirical results, we provide a mathematical analysis of the MSE equation and how its components decide which model to select and why a specific method behaves that way in a range of sample sizes. The analysis showed why the direction estimation method is better than model-based methods in larger sample sizes and how the between-group variation and the within-group variation affect the MSE equation.
翻译:使用合成人群的基于智能体的模拟(synthetic population agent-based simulation)可以帮助我们在保持同一人群内其他条件不变(即作为数字孪生)的情况下,比较不同的治疗条件。此类大规模人群模拟需要巨大的计算能力(即 CPU 资源),才能获得治疗效果(treatment effects)的准确估计。我们可以利用模拟结果的元模型(meta models)来避免模拟每一种治疗条件。在给定样本量(即模拟运行次数)的情况下,选择最佳的估计模型是一个关键问题。根据样本量的不同,方法准确估计的能力可能会发生显著变化。在本文中,我们讨论了探索在特定样本量下哪种模型效果最佳的不同方法。除了实证结果外,我们还对均方误差(MSE)方程提供了数学分析,并分析了其组成部分如何决定模型选择,以及为何特定方法在一定样本量范围内表现出那样的行为。分析揭示了为什么在大样本量下方向估计方法优于基于模型的方法,以及组间变异和组内变异如何影响 MSE 方程。