Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM. Experimental validation highlights that, for a given number of training samples, our set covering selection is a better strategy than selecting random pipelines or using all the available pipelines. Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity. Finally, different benchmarks for classical and wild databases show the good generalization property of the extracted databases. Additional resources are available at github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering.
翻译:在操作框架下,隐写者使用的载体很可能来自与研究人员训练其隐写分析模型时不同的传感器和不同处理流水线。因此,当涉及分布外载体时,性能差距不可避免,这是一种极为常见的情况,称为载体源不匹配(CSM)。本文中,我们探索了一个处理流水线网格,以研究CSM的起源,更好地理解它并更有效地应对它。我们采用一种集合覆盖贪心算法,选择代表性的流水线,最小化代表流水线与集合内流水线之间的最大遗憾。我们的主要贡献在于提出了一种生成相关基础数据库的方法,能够应对操作中的CSM。实验验证表明,在给定训练样本数量的情况下,我们的集合覆盖选择策略优于随机选择流水线或使用所有可用流水线的策略。我们的分析还显示,去噪、锐化、下采样等参数对于促进多样性非常重要。最后,针对经典和野外数据库的不同基准测试显示了所提取数据库的良好泛化性能。更多资源可在github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering获取。