This paper reveals a data bias issue that can severely affect the performance while conducting a machine learning model for malicious URL detection. We describe how such bias can be identified using interpretable machine learning techniques, and further argue that such biases naturally exist in the real world security data for training a classification model. We then propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects from the biased features. The solution is based on the technique of self-supervised adversarial training to train deep neural networks learning invariant embedding from biased data. We conduct a wide range of experiments to demonstrate that the proposed strategy can lead to significantly better generalization capability for both CNN-based and RNN-based detection models.
翻译:本文揭示了在利用机器学习模型进行恶意URL检测时,一种可能严重影响模型性能的数据偏差问题。我们阐述了如何通过可解释机器学习技术识别此类偏差,并进一步论证了此类偏差在真实世界用于训练分类模型的安全数据中天然存在。随后,我们提出一种去偏训练策略,可应用于大多数基于深度学习的模型,以缓解偏差特征带来的负面影响。该方案基于自监督对抗训练技术,旨在训练深度神经网络从偏差数据中学习不变嵌入。通过广泛实验证明,所提策略能够显著提升基于CNN与基于RNN的检测模型的泛化能力。