Simulating group-level user behavior enables scalable counterfactual evaluation of merchant strategies without costly online experiments. However, building a trustworthy simulator faces two structural challenges. First, information incompleteness causes reasoning-based simulators to over-rationalize when unobserved factors such as offline context and implicit habits are missing. Second, mechanism duality requires capturing both interpretable preferences and implicit statistical regularities, which no single paradigm achieves alone. We propose Policy-Guided Hybrid Simulation (PGHS), a dual-process framework that mines transferable decision policies from behavioral trajectories and uses them as a shared alignment layer. This layer anchors an LLM-based reasoning branch that prevents over-rationalization and an ML-based fitting branch that absorbs implicit regularities. Group-level predictions from both branches are fused for complementary correction. We deploy PGHS on Meituan with 101 merchants and over 26,000 trajectories. PGHS achieves a group simulation error of 8.80%, improving over the best reasoning-based and fitting-based baselines by 45.8% and 40.9% respectively.
翻译:模拟群体用户行为能够实现在无需昂贵在线实验的情况下,对商家策略进行可扩展的反事实评估。然而,构建可信的模拟器面临两大结构性挑战。首先,信息不完整性会导致基于推理的模拟器在缺失线下背景、隐性习惯等未观测因素时过度理性化。其次,机制二元性要求同时捕捉可解释的偏好和隐性的统计规律,单一范式无法独立实现。我们提出策略引导混合模拟(Policy-Guided Hybrid Simulation, PGHS),这是一种双流程框架,可从行为轨迹中挖掘可迁移的决策策略,并将其作为共享对齐层。该层锚定了基于大语言模型的推理分支(防止过度理性化)和基于机器学习的拟合分支(吸收隐性规律)。两个分支的群体预测结果通过互补校正实现融合。我们在美团平台部署了PGHS,涉及101个商家和超过26,000条轨迹。PGHS实现了8.80%的群体模拟误差,相较于最优的纯推理基线和纯拟合基线分别提升了45.8%和40.9%。