RECON: An LLM-Enhanced Backward Constraint Analysis Framework

While traditional techniques, such as symbolic execution, provide a principled foundation for precise constraint reasoning in program analysis, they struggle to scale to modern software systems mainly due to path explosion, the need for function modeling, and the loss of semantic intent at low-level program representations. In complex execution environments such as Android, characterized by extensive framework interactions and event-driven behavior, these limitations are even more amplified. Thus, in this paper, we present a novel large language model (LLM)-enhanced backward constraint analysis framework that combines the precision of static program analysis with LLM's semantic understanding to extract precise execution constraints from Android bytecode. Our approach, titled RECON, performs backward path discovery from target method(s) to the application entry point(s), discovers method-level control-flow constraints, and leverages LLM reasoning to transform bytecode conditions into interpretable specifications. We evaluated RECON using five LLMs across 78 Android constraint-extraction scenarios and compared it with traditional symbolic execution on real-world applications. Results demonstrate that our approach operates 5.8X faster than traditional symbolic execution, with a 100% success rate, while maintaining logical equivalence and providing significantly more precise and interpretable output. We further evaluated RECON for malware analysis on 100 samples. The results indicate an 84% success rate in generating semantic constraints that lead to the execution of dangerous API behaviors and in detecting complex constraints across multiple execution paths.

翻译：尽管符号执行等传统技术为程序分析中的精确约束推理提供了原则性基础，但由于路径爆炸、函数建模需求以及底层程序表示中语义信息丢失等问题，这些技术难以扩展到现代软件系统。在Android等复杂执行环境中（其特点包括广泛的框架交互和事件驱动行为），这些局限性尤为突出。因此，本文提出一种新颖的大语言模型增强的后向约束分析框架，该框架结合了静态程序分析的精确性与大语言模型的语义理解能力，以从Android字节码中提取精确的执行约束。我们提出的方法名为RECON，通过从目标方法到应用程序入口点的后向路径发现，获取方法级控制流约束，并利用大语言模型推理将字节码条件转化为可解释的规约。我们使用五种大语言模型在78个Android约束提取场景中评估了RECON，并将其与真实世界应用程序上的传统符号执行进行了对比。结果表明，我们的方法比传统符号执行快5.8倍，成功率达到100%，同时保持逻辑等价性，并输出显著更精确和可解释的结果。我们进一步在100个恶意软件样本上评估了RECON，结果显示其在生成导致危险API行为执行的语义约束以及检测跨多条执行路径的复杂约束方面，成功率达到84%。