Multijet events with heavy-flavors are of central importance at the LHC since many relevant processes -- such as $t\bar t$, $hh$, $t\bar t h$ and others -- have a preferred branching ratio for this final state. Current techniques for tackling these processes use hard-assignment selections through $b$-tagging working points, and suffer from systematic uncertainties because of the difficulties in Monte Carlo simulations. We develop a flexible Bayesian mixture model approach to simultaneously infer $b$-tagging score distributions and the flavor mixture composition in the dataset. We model multidimensional jet events, and to enhance estimation efficiency, we design structured priors that leverages the continuity and unimodality of the $b$-tagging score distributions. Remarkably, our method eliminates the need for a parametric assumption and is robust against model misspecification -- It works for arbitrarily flexible continuous curves and is better if they are unimodal. We have run a toy inferential process with signal $bbbb$ and backgrounds $bbcc$ and $cccc$, and we find that with a few hundred events we can recover the true mixture fractions of the signal and backgrounds, as well as the true $b$-tagging score distribution curves, despite their arbitrariness and nonparametric shapes. We discuss prospects for taking these findings into a realistic scenario in a physics analysis. The presented results could be a starting point for a different and novel kind of analysis in multijet events, with a scope competitive with current state-of-the-art analyses. We also discuss the possibility of using these results in general cases of signals and backgrounds with approximately known continuous distributions and/or expected unimodality.
翻译:含重味道的多喷注事件在LHC中具有核心重要性,因为许多相关过程——例如$t\bar t$、$hh$、$t\bar t h$等——对该末态具有优先的分支比。当前处理这些过程的技术通过$b$标记工作点采用硬分配选择方法,并因蒙特卡洛模拟的困难而承受较大的系统不确定性。我们开发了一种灵活的贝叶斯混合模型方法,可同时推断数据集中$b$标记得分分布与味道混合成分。我们对多维喷注事件进行建模,并通过设计利用$b$标记得分分布连续性与单峰性的结构化先验来提升估计效率。值得注意的是,我们的方法无需参数化假设,且对模型设定错误具有鲁棒性——该方法适用于任意灵活的连续曲线,当曲线具有单峰性时效果更佳。我们以信号$bbbb$及本底$bbcc$、$cccc$进行了玩具推断实验,发现仅需数百个事件即可准确恢复信号与本底的真实混合比例,以及真实的$b$标记得分分布曲线——即使这些曲线具有任意性且呈非参数形态。我们探讨了将这些发现应用于实际物理分析的前景。所呈现的结果可能为多喷注事件分析开辟一条新颖的研究路径,其分析范围可与当前最先进的分析方法相竞争。我们还讨论了在信号与本底具有近似已知连续分布和/或预期单峰性的一般情形中应用这些结果的可能性。