Leveraging LSTM and GAN for Modern Malware Detection

The malware booming is a cyberspace equal to the effect of climate change to ecosystems in terms of danger. In the case of significant investments in cybersecurity technologies and staff training, the global community has become locked up in the eternal war with cyber security threats. The multi-form and changing faces of malware are continuously pushing the boundaries of the cybersecurity practitioners employ various approaches like detection and mitigate in coping with this issue. Some old mannerisms like signature-based detection and behavioral analysis are slow to adapt to the speedy evolution of malware types. Consequently, this paper proposes the utilization of the Deep Learning Model, LSTM networks, and GANs to amplify malware detection accuracy and speed. A fast-growing, state-of-the-art technology that leverages raw bytestream-based data and deep learning architectures, the AI technology provides better accuracy and performance than the traditional methods. Integration of LSTM and GAN model is the technique that is used for the synthetic generation of data, leading to the expansion of the training datasets, and as a result, the detection accuracy is improved. The paper uses the VirusShare dataset which has more than one million unique samples of the malware as the training and evaluation set for the presented models. Through thorough data preparation including tokenization, augmentation, as well as model training, the LSTM and GAN models convey the better performance in the tasks compared to straight classifiers. The research outcomes come out with 98% accuracy that shows the efficiency of deep learning plays a decisive role in proactive cybersecurity defense. Aside from that, the paper studies the output of ensemble learning and model fusion methods as a way to reduce biases and lift model complexity.

翻译：恶意软件泛滥对网络空间造成的危害，堪比气候变化对生态系统的影响。尽管全球在网络安全技术和人员培训方面投入巨资，国际社会仍深陷于与网络安全威胁的永恒战争之中。恶意软件多形态、多变性的特征持续挑战着网络安全从业者采用的检测与缓解等应对策略。基于签名检测和基于行为分析等传统方法在快速适应恶意软件类型演变方面日趋滞后。为此，本文提出利用深度学习模型LSTM网络与GAN来提升恶意软件检测的准确率与速度。这种依托原始字节流数据与深度学习架构的快速发展的前沿技术，较传统方法展现出更优的准确性与性能表现。通过LSTM与GAN的集成模型进行数据合成，可有效扩展训练数据集，进而提升检测精度。本文采用包含超过百万个独特恶意软件样本的VirusShare数据集，作为所提模型的训练与评估集。经过Tokenization、数据增强、模型训练等充分的数据准备流程，LSTM与GAN模型在各项任务中均表现出优于标准分类器的性能。研究结果显示检测准确率达98%，充分证明了深度学习在主动式网络安全防御中的关键作用。此外，本文还探讨了集成学习与模型融合方法在降低偏差、提升模型复杂度方面的有效性。

相关内容

长短期记忆网络

关注 120

长短期记忆网络(LSTM)是一种用于深度学习领域的人工回归神经网络(RNN)结构。与标准的前馈神经网络不同，LSTM具有反馈连接。它不仅可以处理单个数据点(如图像)，还可以处理整个数据序列(如语音或视频)。例如，LSTM适用于未分段、连接的手写识别、语音识别、网络流量或IDSs(入侵检测系统)中的异常检测等任务。

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日