OBsmith: LLM-Powered JavaScript Obfuscator Testing

JavaScript obfuscators are widely deployed to protect intellectual property and resist reverse engineering, yet their correctness has been largely overlooked compared to performance and resilience. Existing evaluations typically measure resistance to deobfuscation, leaving the critical question of whether obfuscators preserve program semantics unanswered. Incorrect transformations can silently alter functionality, compromise reliability, and erode security-undermining the very purpose of obfuscation. To address this gap, we present OBsmith, a novel framework to systematically test JavaScript obfuscators using large language models (LLMs). OBsmith leverages LLMs to generate program sketches abstract templates capturing diverse language constructs, idioms, and corner cases-which are instantiated into executable programs and subjected to obfuscation under different configurations. Besides LLM-powered sketching, OBsmith also employs a second source: automatic extraction of sketches from real programs. This extraction path enables more focused testing of project specific features and lets developers inject domain knowledge into the resulting test cases. OBsmith uncovers 11 previously unknown correctness bugs. Under an equal program budget, five general purpose state-of-the-art JavaScript fuzzers (FuzzJIT, Jsfunfuzz, Superion, DIE, Fuzzilli) failed to detect these issues, highlighting OBsmith's complementary focus on obfuscation induced misbehavior. An ablation shows that all components except our generic MRs contribute to at least one bug class; the negative MR result suggests the need for obfuscator-specific metamorphic relations. Our results also seed discussion on how to balance obfuscation presets and performance cost. We envision OBsmith as an important step towards automated testing and quality assurance of obfuscators and other semantic-preserving toolchains.

翻译：JavaScript混淆器被广泛用于保护知识产权和抵御逆向工程，但其正确性问题相较于性能和抗逆向能力在很大程度上被忽视。现有评估通常仅衡量其抗反混淆能力，而混淆器是否保持程序语义这一关键问题仍未得到解答。错误的转换可能悄无声息地改变功能、损害可靠性并削弱安全性——这恰恰违背了混淆的根本目的。为填补这一空白，我们提出了OBsmith，一个利用大语言模型（LLMs）系统化测试JavaScript混淆器的新框架。OBsmith利用LLMs生成程序草图——即捕获多样化语言结构、惯用法和边界情况的抽象模板——这些模板被实例化为可执行程序，并在不同配置下进行混淆处理。除了基于LLM的草图生成，OBsmith还采用第二种来源：从真实程序中自动提取草图。这种提取路径支持对项目特定功能进行更聚焦的测试，并允许开发者将领域知识注入生成的测试用例中。OBsmith发现了11个先前未知的正确性缺陷。在相同程序预算下，五种通用型最先进的JavaScript模糊测试工具（FuzzJIT、Jsfunfuzz、Superion、DIE、Fuzzilli）均未能检测到这些问题，凸显了OBsmith对混淆引发异常行为的互补性关注。消融实验表明，除通用蜕变关系外，所有组件均对至少一类缺陷的发现有所贡献；通用蜕变关系的阴性结果提示需要针对混淆器设计特定的蜕变关系。我们的研究结果还引发了关于如何平衡混淆预设与性能成本的讨论。我们期望OBsmith能成为迈向混淆器及其他语义保持工具链自动化测试与质量保障的重要一步。