Improving Android Malware Detection Through Data Augmentation Using Wasserstein Generative Adversarial Networks

Generative Adversarial Networks (GANs) have demonstrated their versatility across various applications, including data augmentation and malware detection. This research explores the effectiveness of utilizing GAN-generated data to train a model for the detection of Android malware. Given the considerable storage requirements of Android applications, the study proposes a method to synthetically represent data using GANs, thereby reducing storage demands. The proposed methodology involves creating image representations of features extracted from an existing dataset. A GAN model is then employed to generate a more extensive dataset consisting of realistic synthetic grayscale images. Subsequently, this synthetic dataset is utilized to train a Convolutional Neural Network (CNN) designed to identify previously unseen Android malware applications. The study includes a comparative analysis of the CNN's performance when trained on real images versus synthetic images generated by the GAN. Furthermore, the research explores variations in performance between the Wasserstein Generative Adversarial Network (WGAN) and the Deep Convolutional Generative Adversarial Network (DCGAN). The investigation extends to studying the impact of image size and malware obfuscation on the classification model's effectiveness. The data augmentation approach implemented in this study resulted in a notable performance enhancement of the classification model, ranging from 1.5% to 7%, depending on the dataset. The highest achieved F1 score reached 0.975. Keywords--Generative Adversarial Networks, Android Malware, Data Augmentation, Wasserstein Generative Adversarial Network

翻译：生成对抗网络（GANs）已在多种应用中展现出其多样性，包括数据增强和恶意软件检测。本研究探讨了利用GAN生成的数据训练模型以检测安卓恶意软件的有效性。鉴于安卓应用程序的存储需求较大，本研究提出了一种使用GAN合成表示数据的方法，从而降低存储需求。所提出的方法涉及从现有数据集中提取的特征创建图像表示。随后，采用GAN模型生成一个由逼真的合成灰度图像组成的更大规模数据集。之后，利用该合成数据集训练一个卷积神经网络（CNN），旨在识别之前未见过的安卓恶意软件应用程序。本研究对CNN在真实图像和GAN生成的合成图像上训练时的性能进行了比较分析。此外，研究还探讨了Wasserstein生成对抗网络（WGAN）与深度卷积生成对抗网络（DCGAN）之间的性能差异。研究进一步扩展至图像大小和恶意软件混淆对分类模型效果的影响分析。本研究中实施的数据增强方法使分类模型的性能显著提升，提升幅度在1.5%至7%之间，具体取决于数据集。所达到的最高F1分数为0.975。关键词——生成对抗网络、安卓恶意软件、数据增强、Wasserstein生成对抗网络

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日