Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.
翻译:软件混淆,尤其在JavaScript中普遍存在,阻碍了代码理解和分析,给软件测试、静态分析和恶意软件检测带来了重大挑战。本文介绍了CASCADE,一种新颖的混合方法,它将Gemini的高级编码能力与编译器中间表示(IR)(特别是JavaScript IR (JSIR))的确定性转换能力相结合。通过利用Gemini识别关键的序言函数(这些函数是大多数流行混淆技术的基础组件),并利用JSIR进行后续的代码转换,CASCADE有效地恢复了原始字符串和API名称等语义元素,并揭示了原始程序行为。该方法克服了现有静态和动态反混淆技术的局限性,消除了成百上千条硬编码规则,同时实现了可靠性和灵活性。CASCADE已在谷歌的生产环境中部署,显著提升了JavaScript反混淆效率并减少了逆向工程工作量。