MA-SBI: Misspecification-Aware Simulation-Based Inference via Side-Channel Guidance

Simulation-based inference (SBI) of latent parameters is often hindered by simulator misspecification, the mismatch between simulated and real-world observations caused by inherent modeling simplifications. RoPE, the recent state-of-the-art for robust SBI, addresses this through optimal transport between learned representations of real and simulated observations, but requires ground-truth parameter calibration pairs that are typically unavailable in the very settings where SBI is needed. What practitioners do have is unstructured side-information such as regime labels, instruction text, and policy bulletins. We propose Misspecification-Aware Simulation-Based Inference (MA-SBI), a calibration-free framework that turns this side-channel into a posterior correction. A learned corrector maps side-channel text to an observation-space shift applied before any pre-trained amortized posterior, requiring no retraining and no parameter ground-truth. Our main theorem bounds achievable bias reduction by the mutual information between misspecification and side-channel, with a non-vacuous constant that extends to all sub-Gaussian noise via Donsker-Varadhan. On hide-the-calibration benchmarks, MA-SBI with text alone matches the oracle posterior across 10 seeds and two backbones (TOST equivalence), while RoPE given more data does not. The two approaches are complementary: where misspecification is structural and recoverable from parameter pairs, RoPE dominates, as the theory predicts. A stochastic variant improves posterior-predictive log-likelihood on real COVID and OxCGRT epidemiological data, and correctly leaves the posterior unchanged on a well-specified cognitive-science corpus.

翻译：基于仿真的推理模型（SBI）对潜变量的推断常因仿真器误设（即因建模简化导致的仿真观测与真实观测之间的不匹配）而受到阻碍。最新的稳健SBI方法RoPE通过最优传输实现真实与仿真观测学习表征的对齐来解决此问题，但该方法需要真实参数校准对——而在需要SBI的场景中通常无法获取这些数据。实践者实际拥有的是非结构化侧信息，如状态标签、指令文本和政策公告。我们提出"意识到模型误设的基于仿真的推理框架"（MA-SBI），这是一种无需校准的框架，可将侧信道转化为后验修正。学习到的修正器将侧信道文本映射为观测空间偏移量，该偏移量在任意预训练非条件后验之前应用，既无需重新训练也无需真实参数。我们的主要定理通过误设与侧信道之间的互信息界定了可实现偏差缩减的边界，其中非平凡常数通过Donsker-Varadhan不等式扩展至所有次高斯噪声。在隐藏校准基准测试中，仅使用文本的MA-SBI在10个随机种子和两种骨干网络上匹配了最优后验（TOST等价性），而获得更多数据的RoPE未能做到。两种方法具有互补性：正如理论预测，当误设具有结构性且可从参数对中恢复时，RoPE占优。随机变体在真实COVID和OxCGRT流行病学数据上提升了后验预测对数似然，并在充分设定的认知科学语料库上正确保持了后验不变性。