面向恶意软件检测的函数调用图与进程调用图联合嵌入学习 (Learning Joint Embeddings of Function and Process Call Graphs for Malware Detection)

Software systems can be represented as graphs, capturing dependencies among functions and processes. An interesting aspect of software systems is that they can be represented as different types of graphs, depending on the extraction goals and priorities. For example, function calls within the software can be captured to create function call graphs, which highlight the relationships between functions and their dependencies. Alternatively, the processes spawned by the software can be modeled to generate process interaction graphs, which focus on runtime behavior and inter-process communication. While these graph representations are related, each captures a distinct perspective of the system, providing complementary insights into its structure and operation. While previous studies have leveraged graph neural networks (GNNs) to analyze software behaviors, most of this work has focused on a single type of graph representation. The joint modeling of both function call graphs and process interaction graphs remains largely underexplored, leaving opportunities for deeper, multi-perspective analysis of software systems. This paper presents a pipeline for constructing and training Function Call Graphs (FCGs) and Process Call Graphs (PCGs) and learning joint embeddings. We demonstrate that joint embeddings outperform a single-graph model. In this paper, we propose GeminiNet, a unified neural network approach that learns joint embeddings from both FCGs and PCGs. We construct a new dataset of 635 Windows executables (318 malicious and 317 benign), extracting FCGs via Ghidra and PCGs via Any.Run sandbox. GeminiNet employs dual graph convolutional branches with an adaptive gating mechanism that balances contributions from static and dynamic views.

翻译：软件系统可表示为图结构，用以捕获函数与进程间的依赖关系。软件系统的一个有趣特性在于，根据提取目标与优先级的不同，可将其表示为不同类型的图。例如，可通过捕获软件内部的函数调用构建函数调用图，以突显函数间关系及其依赖；亦可通过建模软件生成的进程构建进程交互图，以聚焦运行时行为与进程间通信。尽管这些图表示相互关联，但各自捕捉了系统的不同视角，为其结构与运行提供了互补性洞察。虽然已有研究利用图神经网络分析软件行为，但多数工作仅关注单一类型的图表示。函数调用图与进程交互图的联合建模在很大程度上尚未得到充分探索，这为软件系统的深度多视角分析提供了研究空间。本文提出了一种构建与训练函数调用图及进程调用图并学习联合嵌入的技术流程。我们证明联合嵌入模型优于单图模型。本文提出GeminiNet——一种能够从FCG与PCG中学习联合嵌入的统一神经网络方法。我们构建了包含635个Windows可执行文件（318个恶意样本与317个良性样本）的新数据集，通过Ghidra提取FCG，通过Any.Run沙箱提取PCG。GeminiNet采用具有自适应门控机制的双图卷积分支架构，以平衡静态视图与动态视图的贡献。