Malware authors apply different techniques of control flow obfuscation, in order to create new malware variants to avoid detection. Existing Siamese neural network (SNN)-based malware detection methods fail to correctly classify different malware families when such obfuscated malware samples are present in the training dataset, resulting in high false-positive rates. To address this issue, we propose a novel task-aware few-shot-learning-based Siamese Neural Network that is resilient against the presence of malware variants affected by such control flow obfuscation techniques. Using the average entropy features of each malware family as inputs, in addition to the image features, our model generates the parameters for the feature layers, to more accurately adjust the feature embedding for different malware families, each of which has obfuscated malware variants. In addition, our proposed method can classify malware classes, even if there are only one or a few training samples available. Our model utilizes few-shot learning with the extracted features of a pre-trained network (e.g., VGG-16), to avoid the bias typically associated with a model trained with a limited number of training samples. Our proposed approach is highly effective in recognizing unique malware signatures, thus correctly classifying malware samples that belong to the same malware family, even in the presence of obfuscated malware variants. Our experimental results, validated by N-way on N-shot learning, show that our model is highly effective in classification accuracy, exceeding a rate \textgreater 91\%, compared to other similar methods.
翻译:恶意软件作者采用不同的控制流混淆技术创建新的恶意软件变种以规避检测。现有基于孪生神经网络(SNN)的恶意软件检测方法,当训练数据集中存在此类混淆恶意软件样本时,无法正确分类不同恶意软件家族,导致较高的误报率。为解决此问题,我们提出一种新颖的基于任务感知的小样本学习孪生神经网络,该网络对受此类控制流混淆技术影响的恶意软件变种具有鲁棒性。通过将每个恶意软件家族的平均熵特征作为输入,结合图像特征,我们的模型生成特征层参数,从而更精确地调整不同恶意软件家族(每个家族均包含混淆变种)的特征嵌入。此外,我们的方法即使仅有一个或少量训练样本可用时,仍能对恶意软件类别进行分类。该模型利用预训练网络(如VGG-16)提取的特征进行小样本学习,避免因训练样本有限而产生的典型偏差。我们提出的方法能高效识别独特恶意软件签名,从而在存在混淆变种的情况下,正确分类归属同一恶意软件家族的恶意软件样本。经N-way N-shot学习验证的实验结果表明,与其他类似方法相比,我们的模型在分类准确率上表现优异,超过91%。