In the digitized world, smartphones and their apps play an important role. To name just a few examples, some apps offer possibilities for entertainment, others for online banking, and others offer support for two-factor authentication. Therefore, with smartphones also, sensitive information is shared; thus, they are a desirable target for malware. The following technical report gives an overview of how machine learning, especially neural networks, can be employed to detect malicious Android apps based on their metadata. Detection based on the metadata is necessary since not all of an app's information is readable from another app due to the security layout of Android. To do so, a comparable big dataset of metadata of apps has been collected for learning and evaluation in this work. The first section, after the introduction, presents the related work, followed by the description of the sources of the dataset and the selection of the features used for machine learning, in this case, only the app permissions. Afterward, a free available dataset is used to find an efficient and effective neural network model for learning and evaluation. Here, the fully connected network type consisting of dense layers is chosen. Then this model is trained and evaluated on the new, more extensive dataset to obtain a representative result. It turns out that this model detects malware with an accuracy of 92.93% based on an app's permissions.
翻译:在数字化世界中,智能手机及其应用程序扮演着重要角色。仅举几例,有些应用程序提供娱乐功能,有些则支持在线银行服务,还有些为双因素认证提供支持。因此,智能手机也涉及敏感信息的共享,这使得它们成为恶意软件的攻击目标。本技术报告概述了如何利用机器学习(尤其是神经网络)基于应用程序的元数据来检测恶意安卓应用。由于安卓系统的安全架构限制,第三方应用无法完整读取其他应用的完整信息,因此基于元数据的检测至关重要。为此,本研究收集了一个规模可观的应用程序元数据集,用于模型学习与评估。在引言之后,第一部分介绍了相关工作,随后描述了数据集的来源以及机器学习所使用特征的选择——本研究中仅使用应用权限作为特征。接着,利用公开可用的数据集寻找高效且有效的神经网络模型进行学习与评估,最终选择了由密集层构成的全连接网络类型。然后使用新的、更全面的数据集对该模型进行训练与评估,以获得具有代表性的结果。实验表明,该模型基于应用的权限特征,能够以92.93%的准确率检测恶意软件。