Code metamorphism refers to a computer programming exercise wherein the program modifies its own code (partial or entire) consistently and automatically while retaining its core functionality. This technique is often used for online performance optimization and automated crash recovery in certain mission-critical applications. However, the technique has been misappropriated by malware creators to bypass signature-based detection measures instituted by anti-malware engines. However, current code mutation engines used by threat actors offer only a limited degree of mutation, which is frequently detectable via static code analysis. The advent of large language models (LLMs), such as ChatGPT 4.0 and Google Bard may lead to a significant evolution in this landscape. These models have demonstrated a level of algorithm comprehension and code synthesis capability that closely resembles human abilities. This advancement has sparked concerns among experts that such models could be exploited by threat actors to generate sophisticated metamorphic malware. This paper explores the potential of several prominent LLMs for software code mutation that may be used to reconstruct (with mutation) existing malware code bases or create new forms of embedded mutation engines for next-gen metamorphic malwares. In this work, we introduce a framework for creating self-testing program mutation engines based on LLM/Transformer-based models. The proposed framework serves as an essential tool in testing next-gen metamorphic malware detection engines.
翻译:代码变形是指一种计算机编程实践,其中程序在保持其核心功能的同时,持续且自动地修改自身代码(部分或全部)。该技术常用于某些关键任务应用中的在线性能优化和自动崩溃恢复。然而,该技术已被恶意软件制作者滥用,以绕过反恶意软件引擎实施的基于签名的检测措施。然而,当前威胁行为者使用的代码变异引擎仅提供有限程度的变异,这通常可通过静态代码分析检测到。大型语言模型(LLMs)的出现,如ChatGPT 4.0和Google Bard,可能导致这一格局发生重大演变。这些模型已展现出一种与人类能力极为相似的算法理解和代码合成水平。这一进展引发了专家们的担忧,即此类模型可能被威胁行为者利用来生成复杂的变形恶意软件。本文探讨了几种主流LLMs在软件代码变异方面的潜力,这些模型可用于(通过变异)重构现有的恶意代码库,或为下一代变形恶意软件创建新型嵌入式变异引擎。在本工作中,我们引入了一个基于LLM/Transformer模型创建自测试程序变异引擎的框架。所提出的框架可作为测试下一代变形恶意软件检测引擎的重要工具。