We analyze a lightweight simulation-based inference method that infers simulator parameters using only a regression-based projection of the observed data. After fitting a surrogate linear regression once, the procedure simulates small batches at the proposed parameter values and assigns kernel weights based on the resulting batch-residual discrepancy, producing a self-normalized pseudo-posterior that is simple, parallelizable, and requires access only to the fitted regression coefficients rather than raw observations. We formalize the construction as an importance-sampling approximation to a population target that averages over simulator randomness, prove consistency as the number of parameter draws grows, and establish stability in estimating the surrogate regression from finite samples. We then characterize the asymptotic concentration as the batch size increases and the bandwidth shrinks, showing that the pseudo-posterior concentrates on an identified set determined by the chosen projection, thereby clarifying when the method yields point versus set identification. Experiments on a tractable nonlinear model and on a cosmological calibration task using the DREAMS simulation suite illustrate the computational advantages of regression-based projections and the identifiability limitations arising from low-information summaries.
翻译:本文分析了一种轻量级的仿真推断方法,该方法仅通过观测数据的回归投影来推断仿真器参数。在完成一次代理线性回归拟合后,该流程会在建议的参数值处模拟小批次数据,并根据生成的批次残差异差分配核权重,从而产生一个自归一化的伪后验分布。该方法结构简单、可并行化,且仅需访问拟合的回归系数而非原始观测数据。我们将该构建形式化为对仿真随机性进行平均的总体目标的重要性采样近似,证明了在参数抽样次数增加时的收敛性,并建立了从有限样本估计代理回归的稳定性。随后,我们通过分析批次规模增大与带宽收缩时的渐近集中特性,表明伪后验会集中于由所选投影确定的识别集合,从而阐明了该方法何时能实现点识别或集合识别。通过在可处理的非线性模型及使用DREAMS仿真套件进行宇宙学校准任务上的实验,我们展示了基于回归投影的计算优势以及低信息量摘要带来的可识别性局限。