With the rapid advancement of Internet technology, the threat of malware to computer systems and network security has intensified. Malware affects individual privacy and security and poses risks to critical infrastructures of enterprises and nations. The increasing quantity and complexity of malware, along with its concealment and diversity, challenge traditional detection techniques. Static detection methods struggle against variants and packed malware, while dynamic methods face high costs and risks that limit their application. Consequently, there is an urgent need for novel and efficient malware detection techniques to improve accuracy and robustness. This study first employs the minhash algorithm to convert binary files of malware into grayscale images, followed by the extraction of global and local texture features using GIST and LBP algorithms. Additionally, the study utilizes IDA Pro to decompile and extract opcode sequences, applying N-gram and tf-idf algorithms for feature vectorization. The fusion of these features enables the model to comprehensively capture the behavioral characteristics of malware. In terms of model construction, a CNN-BiLSTM fusion model is designed to simultaneously process image features and opcode sequences, enhancing classification performance. Experimental validation on multiple public datasets demonstrates that the proposed method significantly outperforms traditional detection techniques in terms of accuracy, recall, and F1 score, particularly in detecting variants and obfuscated malware with greater stability. The research presented in this paper offers new insights into the development of malware detection technologies, validating the effectiveness of feature and model fusion, and holds promising application prospects.
翻译:随着互联网技术的飞速发展,恶意软件对计算机系统与网络安全的威胁日益加剧。恶意软件不仅影响个人隐私与安全,也对企业和国家的关键基础设施构成风险。恶意软件数量与复杂性的持续增长,及其隐蔽性与多样性,对传统检测技术构成了挑战。静态检测方法难以应对变种与加壳恶意软件,而动态方法则面临高成本与高风险,限制了其应用。因此,迫切需要新颖高效的恶意软件检测技术以提高准确性与鲁棒性。本研究首先采用minhash算法将恶意软件的二进制文件转换为灰度图像,随后利用GIST与LBP算法提取全局与局部纹理特征。此外,研究使用IDA Pro进行反编译并提取操作码序列,应用N-gram与tf-idf算法进行特征向量化。这些特征的融合使模型能够全面捕捉恶意软件的行为特性。在模型构建方面,设计了一种CNN-BiLSTM融合模型,以同时处理图像特征与操作码序列,从而提升分类性能。在多个公开数据集上的实验验证表明,所提方法在准确率、召回率与F1分数上显著优于传统检测技术,尤其在检测变种与混淆恶意软件时表现出更高的稳定性。本文的研究为恶意软件检测技术的发展提供了新的思路,验证了特征与模型融合的有效性,并具有良好的应用前景。