Sandboxes and other dynamic analysis processes are prevalent in malware detection systems nowadays to enhance the capability of detecting 0-day malware. Therefore, techniques of anti-dynamic analysis (TADA) are prevalent in modern malware samples, and sandboxes can suffer from false negatives and analysis failures when analyzing the samples with TADAs. In such cases, human reverse engineers will get involved in conducting dynamic analysis manually (i.e., debugging, patching), which in turn also gets obstructed by TADAs. In this work, we propose a Large Language Model (LLM) based workflow that can pinpoint the location of the TADA implementation in the code, to help reverse engineers place breakpoints used in debugging. Our evaluation shows that we successfully identified the locations of 87.80% known TADA implementations adopted from public repositories. In addition, we successfully pinpoint the locations of TADAs in 4 well-known malware samples that are documented in online malware analysis blogs.
翻译:沙箱及其他动态分析流程在当今恶意软件检测系统中广泛应用,以增强对零日恶意软件的检测能力。因此,反动态分析技术在现代恶意软件样本中普遍存在,而沙箱在分析包含此类技术的样本时可能出现漏报和分析失败。在此类情况下,人工逆向工程师需介入进行手动动态分析(如调试、补丁修复),该过程同样会受到反动态分析技术的阻碍。本研究提出一种基于大语言模型的工作流程,能够精确定位代码中反动态分析技术的实现位置,以协助逆向工程师设置调试所需的断点。评估结果表明,我们成功识别了来自公共代码库的已知反动态分析技术实现中87.80%的位置。此外,我们还成功定位了在线恶意软件分析博客中记载的4个知名恶意软件样本中的反动态分析技术实现位置。