Most static program analyses depend on Call Graphs (CGs), including reachability of security vulnerabilities. Static CGs ensure soundness through over-approximation, which results in inflated sizes and imprecision. Recent research has employed machine learning (ML) models to prune false edges and enhance CG precision. However, these models require real-world programs with high test coverage to generalize effectively and the inference is expensive. In this paper, we present OriginPruner, a novel call graph pruning technique that leverages the method origin, which is where a method signature is first introduced within a class hierarchy. By incorporating insights from a localness analysis that investigated the scope of method interactions into our approach, OriginPruner confidently identifies and prunes edges related to these origin methods. Our key findings reveal that (1) dominant origin methods, such as Iterator.next, significantly impact CG sizes; (2) derivatives of these origin methods are primarily local, enabling safe pruning without affecting downstream inter-procedural analyses; (3) OriginPruner achieves a significant reduction in CG size while maintaining the soundness of CGs for security applications like vulnerability propagation analysis; and (4) OriginPruner introduces minimal computational overhead. These findings underscore the potential of leveraging domain knowledge about the type system for more effective CG pruning, offering a promising direction for future work in static program analysis.
翻译:大多数静态程序分析依赖于调用图(CG),包括安全漏洞的可达性分析。静态调用图通过过近似确保完备性,但这会导致调用图规模膨胀和精度下降。近期研究采用机器学习(ML)模型来剪除虚假边以提高CG精度。然而,这些模型需要具有高测试覆盖率的真实程序才能有效泛化,且推理过程计算成本高昂。本文提出OriginPruner,一种新颖的调用图剪枝技术,该方法利用方法起源——即方法签名在类层次结构中首次引入的位置。通过将研究方法交互范围的局部性分析洞见融入我们的方法,OriginPruner能够可靠地识别并剪除与这些起源方法相关的边。我们的核心发现表明:(1)主导起源方法(如Iterator.next)对CG规模有显著影响;(2)这些起源方法的派生方法主要具有局部性,可在不影响下游过程间分析的前提下安全剪除;(3)OriginPruner在保持漏洞传播分析等安全应用所需CG完备性的同时,显著降低了CG规模;(4)OriginPruner引入的计算开销极小。这些发现凸显了利用类型系统领域知识实现更有效CG剪枝的潜力,为静态程序分析的未来研究提供了新方向。