In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the program, limiting their effectiveness in real-world scenarios. To address these issues, this paper proposes Semantic-Enhanced Analysis (SEA), a new approach to enhance the effectiveness of indirect call analysis. Our fundamental insight is that for common programming practices, indirect calls often exhibit semantic similarity with their invoked targets. This semantic alignment serves as a supportive mechanism for static analysis techniques in filtering out false targets. Notably, contemporary large language models (LLMs) are trained on extensive code corpora, encompassing tasks such as code summarization, making them well-suited for semantic analysis. Specifically, SEA leverages LLMs to generate natural language summaries of both indirect calls and target functions from multiple perspectives. Through further analysis of these summaries, SEA can determine their suitability as caller-callee pairs. Experimental results demonstrate that SEA can significantly enhance existing static analysis methods by producing more precise target sets for indirect calls.
翻译:在当代软件开发中,间接调用的广泛使用以实现动态特性,给构建精确的控制流图带来了挑战,这进一步影响了下游静态分析任务的性能。为解决此问题,已提出多种类型的间接调用分析器。然而,它们未能充分利用程序的语义信息,限制了其在真实场景中的有效性。针对这些问题,本文提出语义增强分析,一种提升间接调用分析有效性的新方法。我们的核心见解是,对于常见的编程实践,间接调用与其调用的目标之间通常表现出语义相似性。这种语义对齐可作为静态分析技术过滤误报目标的支持机制。值得注意的是,当代大语言模型在大量代码语料上进行训练,涵盖了代码摘要等任务,使其非常适合进行语义分析。具体而言,SEA利用大语言模型从多个角度生成间接调用和目标函数的自然语言摘要。通过对这些摘要的进一步分析,SEA能够判断它们是否适合作为调用者-被调用者对。实验结果表明,SEA能够通过为间接调用生成更精确的目标集,显著增强现有的静态分析方法。