In recent years, the use of image-based techniques for malware detection has gained prominence, with numerous studies demonstrating the efficacy of deep learning approaches such as Convolutional Neural Networks (CNN) in classifying images derived from executable files. In this paper, we consider an innovative method that relies on an image conversion process that consists of transforming features extracted from executable files into QR and Aztec codes. These codes capture structural patterns in a format that may enhance the learning capabilities of CNNs. We design and implement CNN architectures tailored to the unique properties of these codes and apply them to a comprehensive analysis involving two extensive malware datasets, both of which include a significant corpus of benign samples. Our results yield a split decision, with CNNs trained on QR and Aztec codes outperforming the state of the art on one of the datasets, but underperforming more typical techniques on the other dataset. These results indicate that the use of QR and Aztec codes as a form of feature engineering holds considerable promise in the malware domain, and that additional research is needed to better understand the relative strengths and weaknesses of such an approach.
翻译:近年来,基于图像的恶意软件检测技术日益受到重视,大量研究已证明卷积神经网络(CNN)等深度学习方法在对可执行文件衍生的图像进行分类方面具有显著效果。本文提出一种创新方法,其核心在于通过图像转换过程,将可执行文件中提取的特征转化为QR码与Aztec码。这类编码以特定格式捕获结构模式,有望增强CNN的学习能力。我们针对此类编码的独特属性设计并实现了专用的CNN架构,并将其应用于对两个大规模恶意软件数据集的综合分析——这两个数据集均包含大量良性样本。实验结果呈现分化态势:在QR码与Aztec码上训练的CNN模型在其中一个数据集上超越了现有最优方法,但在另一个数据集上表现不及传统技术。这些结果表明,将QR码与Aztec码作为特征工程的一种形式在恶意软件领域具有重要潜力,但需进一步研究以更深入理解该方法的相对优势与局限。