With the increasing number and sophistication of malware attacks, malware detection systems based on machine learning (ML) grow in importance. At the same time, many popular ML models used in malware classification are supervised solutions. These supervised classifiers often do not generalize well to novel malware. Therefore, they need to be re-trained frequently to detect new malware specimens, which can be time-consuming. Our work addresses this problem in a hybrid framework of theoretical Quantum ML, combined with feature selection strategies to reduce the data size and malware classifier training time. The preliminary results show that VQC with XGBoost selected features can get a 78.91% test accuracy on the simulator. The average accuracy for the model trained using the features selected with XGBoost was 74% (+- 11.35%) on the IBM 5 qubits machines.
翻译:随着恶意软件攻击数量与复杂度的不断攀升,基于机器学习(ML)的恶意软件检测系统日益重要。与此同时,许多用于恶意软件分类的主流ML模型均为监督学习方案。这些监督分类器通常难以有效泛化至新型恶意软件,因此需要频繁重新训练以检测新出现的恶意软件样本,而这一过程可能耗时较长。本研究在理论量子机器学习混合框架下,结合特征选择策略以缩减数据规模及恶意软件分类器训练时间,从而应对上述问题。初步结果表明,采用XGBoost特征选择的变分量子分类器(VQC)在模拟器上可达78.91%的测试准确率;在IBM 5量子比特机器上,使用XGBoost选定特征训练的模型平均准确率为74%(±11.35%)。