Generative Adversarial Networks (GANs) have demonstrated their versatility across various applications, including data augmentation and malware detection. This research explores the effectiveness of utilizing GAN-generated data to train a model for the detection of Android malware. Given the considerable storage requirements of Android applications, the study proposes a method to synthetically represent data using GANs, thereby reducing storage demands. The proposed methodology involves creating image representations of features extracted from an existing dataset. A GAN model is then employed to generate a more extensive dataset consisting of realistic synthetic grayscale images. Subsequently, this synthetic dataset is utilized to train a Convolutional Neural Network (CNN) designed to identify previously unseen Android malware applications. The study includes a comparative analysis of the CNN's performance when trained on real images versus synthetic images generated by the GAN. Furthermore, the research explores variations in performance between the Wasserstein Generative Adversarial Network (WGAN) and the Deep Convolutional Generative Adversarial Network (DCGAN). The investigation extends to studying the impact of image size and malware obfuscation on the classification model's effectiveness. The data augmentation approach implemented in this study resulted in a notable performance enhancement of the classification model, ranging from 1.5% to 7%, depending on the dataset. The achieved F1 score reached 97.5%. Keywords--Generative Adversarial Networks, Android Malware, Data Augmentation, Wasserstein Generative Adversarial Network
翻译:生成对抗网络(GANs)已在其跨多种应用中展现出广泛适用性,包括数据增强与恶意软件检测。本研究探究利用GAN生成的数据训练模型以检测安卓恶意软件的有效性。鉴于安卓应用程序存在显著的存储需求,本文提出一种通过GAN进行数据合成表示的方法,从而降低存储需求。所提方法涉及从现有数据集中提取特征并生成图像表示,随后采用GAN模型生成由逼真合成灰度图像构成的更大规模数据集。在此基础上,利用该合成数据集训练卷积神经网络(CNN),旨在识别此前未见过的安卓恶意软件应用。本研究对比分析了CNN在基于真实图像与GAN生成合成图像训练时的性能差异。此外,研究进一步探讨了Wasserstein生成对抗网络(WGAN)与深度卷积生成对抗网络(DCGAN)在性能上的差异。研究还扩展到图像尺寸与恶意软件混淆技术对分类模型效能的影响分析。本研究所实施的数据增强方法使分类模型性能获得显著提升,提升幅度因数据集不同介于1.5%至7%之间,最终F1分数达到97.5%。关键词——生成对抗网络、安卓恶意软件、数据增强、Wasserstein生成对抗网络