In particle physics, as in many areas of science, parameter inference relies on simulations to bridge the gap between theory and experiment. Recent developments in simulation-based inference have boosted the sensitivity of analyses; however, biases induced by simulation-data mismodeling can be difficult to control within standard inference pipelines. In this work, we propose a Template-Adapted Mixture Model to confront this problem in the context of signal fraction estimation: inferring the population proportion of signal in a mixed sample of signal and background, both of which follow arbitrarily complex distributions. We harness many biased simulations to perform data-driven estimates of each process distribution in the signal region, substantially reducing the bias on the signal fraction due to the domain shift between simulation and reality. We explore different methodological choices, including model selection, feature representation, and statistical method, and apply them to a Gaussian toy example and to a semi-realistic di-Higgs measurement. We find that the presented methods successfully leverage the biased simulations to provide estimates with well-calibrated uncertainties.
翻译:在粒子物理学以及许多科学领域中,参数推断依赖于模拟来弥合理论与实验之间的差距。模拟推断领域的最新进展提高了分析的灵敏度;然而,模拟与数据之间的建模偏差在标准推断流程中往往难以控制。在本工作中,我们提出了一种模板自适应混合模型,以应对信号分数估计中的这一问题:即从信号与背景的混合样本中推断信号所占种群比例,两者均遵循任意复杂分布。我们利用多个有偏模拟,在信号区域对每个过程分布进行数据驱动估计,从而显著减少因模拟与真实之间的领域偏移所导致的信号分数偏差。我们探讨了不同的方法论选择,包括模型选择、特征表示和统计方法,并将其应用于高斯玩具示例以及半真实的双希格斯测量中。我们发现,所提出的方法成功利用了有偏模拟,提供了具有良好校准不确定度的估计结果。