Continuous Learning for Android Malware Detection

Machine learning methods can detect Android malware with very high accuracy. However, these classifiers have an Achilles heel, concept drift: they rapidly become out of date and ineffective, due to the evolution of malware apps and benign apps. Our research finds that, after training an Android malware classifier on one year's worth of data, the F1 score quickly dropped from 0.99 to 0.76 after 6 months of deployment on new test samples. In this paper, we propose new methods to combat the concept drift problem of Android malware classifiers. Since machine learning technique needs to be continuously deployed, we use active learning: we select new samples for analysts to label, and then add the labeled samples to the training set to retrain the classifier. Our key idea is, similarity-based uncertainty is more robust against concept drift. Therefore, we combine contrastive learning with active learning. We propose a new hierarchical contrastive learning scheme, and a new sample selection technique to continuously train the Android malware classifier. Our evaluation shows that this leads to significant improvements, compared to previously published methods for active learning. Our approach reduces the false negative rate from 16% (for the best baseline) to 10%, while maintaining the same false positive rate (0.6%). Also, our approach maintains more consistent performance across a seven-year time period than past methods.

翻译：机器学习方法能够以极高的准确率检测安卓恶意软件。然而，这些分类器存在一个致命弱点——概念漂移：由于恶意应用和良性应用的不断演化，它们会迅速变得过时且无效。本研究发现，在使用一年的数据训练安卓恶意软件分类器后，将其部署至新测试样本六个月时，F1分数会从0.99急剧下降至0.76。本文提出新方法以应对安卓恶意软件分类器的概念漂移问题。由于机器学习技术需要持续部署，我们采用主动学习：选择新样本供分析人员标注，随后将标注样本加入训练集以重新训练分类器。我们的核心思想是，基于相似性的不确定性对概念漂移具有更强的鲁棒性。因此，我们将对比学习与主动学习相结合，提出了一种新型层次化对比学习方案及样本选择技术，用于持续训练安卓恶意软件分类器。评估结果表明，与先前发表的主动学习方法相比，本方法取得了显著改进：在保持相同假正率（0.6%）的前提下，将假负率从16%（最佳基线）降至10%。此外，在跨越七年的时间跨度内，本方法比过往方法维持了更稳定的一致性性能表现。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日