深入探究函数内联及其对基于机器学习的二进制分析的安全影响 (A Deep Dive into Function Inlining and its Security Implications for ML-based Binary Analysis)

A function inlining optimization is a widely used transformation in modern compilers, which replaces a call site with the callee's body in need. While this transformation improves performance, it significantly impacts static features such as machine instructions and control flow graphs, which are crucial to binary analysis. Yet, despite its broad impact, the security impact of function inlining remains underexplored to date. In this paper, we present the first comprehensive study of function inlining through the lens of machine learning-based binary analysis. To this end, we dissect the inlining decision pipeline within the LLVM's cost model and explore the combinations of the compiler options that aggressively promote the function inlining ratio beyond standard optimization levels, which we term extreme inlining. We focus on five ML-assisted binary analysis tasks for security, using 20 unique models to systematically evaluate their robustness under extreme inlining scenarios. Our extensive experiments reveal several significant findings: i) function inlining, though a benign transformation in intent, can (in)directly affect ML model behaviors, being potentially exploited by evading discriminative or generative ML models; ii) ML models relying on static features can be highly sensitive to inlining; iii) subtle compiler settings can be leveraged to deliberately craft evasive binary variants; and iv) inlining ratios vary substantially across applications and build configurations, undermining assumptions of consistency in training and evaluation of ML models.

翻译：函数内联优化是现代编译器中广泛采用的一种转换技术，它通过将调用点替换为被调用函数体来实现。尽管这种转换提升了性能，但它显著影响了机器指令和控制流图等静态特征，而这些特征对于二进制分析至关重要。然而，尽管其影响广泛，函数内联的安全影响至今仍未得到充分探索。本文首次从基于机器学习的二进制分析视角，对函数内联进行了全面研究。为此，我们剖析了LLVM成本模型中的内联决策流程，并探索了能够将函数内联比率推至标准优化级别之外的编译器选项组合，我们将其称为极端内联。我们聚焦于五个面向安全的机器学习辅助二进制分析任务，使用20个独特模型系统评估它们在极端内联场景下的鲁棒性。大量实验揭示了若干重要发现：i) 函数内联尽管意图上是良性转换，却可能（间接）影响机器学习模型行为，潜在地被用于规避判别式或生成式机器学习模型；ii) 依赖静态特征的机器学习模型可能对内联高度敏感；iii) 细微的编译器设置可被利用来刻意构造规避性二进制变体；iv) 内联比率在不同应用程序和构建配置间差异显著，这削弱了机器学习模型训练与评估中一致性假设的可靠性。