Concept drift is a significant challenge for malware detection, as the performance of trained machine learning models degrades over time, rendering them impractical. While prior research in malware concept drift adaptation has primarily focused on active learning, which involves selecting representative samples to update the model, self-training has emerged as a promising approach to mitigate concept drift. Self-training involves retraining the model using pseudo labels to adapt to shifting data distributions. In this research, we propose MORPH -- an effective pseudo-label-based concept drift adaptation method specifically designed for neural networks. Through extensive experimental analysis of Android and Windows malware datasets, we demonstrate the efficacy of our approach in mitigating the impact of concept drift. Our method offers the advantage of reducing annotation efforts when combined with active learning. Furthermore, our method significantly improves over existing works in automated concept drift adaptation for malware detection.
翻译:概念漂移是恶意软件检测面临的一项重大挑战,训练后的机器学习模型性能随时间推移逐渐下降,使其失去实用性。先前针对恶意软件概念漂移自适应的研究主要聚焦于主动学习(通过选取代表性样本来更新模型),而自训练已成为缓解概念漂移的一种有前景的方法。自训练通过利用伪标签重新训练模型,以适应不断变化的数据分布。在本研究中,我们提出MORPH——一种专门针对神经网络设计的、基于伪标签的高效概念漂移自适应方法。通过对Android和Windows恶意软件数据集的广泛实验分析,我们证明了该方法在缓解概念漂移影响方面的有效性。该方法具有减少标注工作量的优势,可与主动学习协同使用。此外,在恶意软件检测的自动化概念漂移自适应方面,我们的方法较现有工作有显著提升。