Cybersecurity attacks against industrial control systems and cyber-physical systems can cause catastrophic real-world damage by infecting device binaries with malware. Mitigating such attacks can benefit from reverse engineering tools that recover sufficient semantic knowledge in terms of mathematical operations in the code. Conventional reverse engineering tools can decompile binaries to low-level code, but offer little semantic insight. This paper proposes REMaQE, an automated framework for reverse engineering of math equations from binary executables. REMaQE uses symbolic execution for dynamic analysis of the binary to extract the relevant semantic knowledge of the implemented algorithms. REMaQE provides an automatic parameter analysis pass which also leverages symbolic execution to identify input, output, and constant parameters of the implemented math equations. REMaQE automatically handles parameters accessed via registers, the stack, global memory, or pointers, and supports reverse engineering of object-oriented implementations such as C++ classes. REMaQE uses an algebraic simplification method which allows it to scale to complex conditional equations with ease. These features make REMaQE stand out over existing reverse engineering approaches for math equations. On a dataset of randomly generated math equations compiled to binaries from C and Simulink implementations, REMaQE accurately recovers a semantically matching equation for 97.53% of the models. For complex equations with more operations, accuracy stays consistently over 94%. REMaQE executes in 0.25 seconds on average and in 1.3 seconds for more complex equations. This real-time execution speed enables a smooth integration in an interactive mathematics-oriented reverse engineering workflow.
翻译:针对工业控制系统和网络物理系统的网络安全攻击,可通过向设备二进制文件注入恶意软件造成灾难性的现实世界损害。缓解此类攻击可借助逆向工程工具,从代码中恢复数学运算相关的充分语义知识。传统逆向工程工具能将二进制文件反编译为低级代码,但提供的语义洞察甚少。本文提出REMaQE,一个从二进制可执行文件中逆向工程数学方程的自动化框架。REMaQE采用符号执行对二进制文件进行动态分析,提取所实现算法的相关语义知识。它提供自动参数分析阶段,同样利用符号执行来识别所实现数学方程的输入、输出和常数参数。REMaQE自动处理通过寄存器、栈、全局内存或指针访问的参数,并支持面向对象实现(如C++类)的逆向工程。REMaQE使用代数简化方法,使其能够轻松扩展至复杂条件方程。这些特性使REMaQE在现有的数学方程逆向工程方法中脱颖而出。在随机生成的数学方程数据集(由C和Simulink实现编译为二进制文件)上,REMaQE对97.53%的模型准确恢复了语义匹配的方程。对于包含更多运算的复杂方程,准确率始终保持在94%以上。REMaQE平均执行时间为0.25秒,对于更复杂方程也只需1.3秒。这种实时执行速度使其能够无缝集成到交互式数学导向的逆向工程工作流中。