WebAssembly enables near-native execution in web applications and is increasingly adopted for tasks that demand high performance and robust security. However, its assembly-like syntax, implicit stack machine, and low-level data types make it extremely difficult for human developers to understand, spurring the need for effective WebAssembly reverse engineering techniques. In this paper, we propose StackSight, a novel neurosymbolic approach that combines Large Language Models (LLMs) with advanced program analysis to decompile complex WebAssembly code into readable C++ snippets. StackSight visualizes and tracks virtual stack alterations via a static analysis algorithm and then applies chain-of-thought prompting to harness LLM's complex reasoning capabilities. Evaluation results show that StackSight significantly improves WebAssembly decompilation. Our user study also demonstrates that code snippets generated by StackSight have significantly higher win rates and enable a better grasp of code semantics.
翻译:WebAssembly支持在网络应用中实现接近原生执行的性能,并日益被用于需要高性能与强安全性的任务。然而,其类似汇编的语法、隐式栈机结构以及底层数据类型使得人类开发者极难理解,从而催生了有效的WebAssembly逆向工程技术需求。本文提出StackSight——一种结合大语言模型与高级程序分析的新型神经符号方法,可将复杂WebAssembly代码反编译为可读的C++片段。StackSight通过静态分析算法可视化并追踪虚拟栈的变更,随后应用思维链提示来利用大语言模型的复杂推理能力。评估结果表明,StackSight显著提升了WebAssembly反编译效果。用户研究也显示,StackSight生成的代码片段具有显著更高的胜率,并能更好地帮助理解代码语义。