In digital forensics, file fragment classification is an important step toward completing file carving process. There exist several techniques to identify the type of file fragments without relying on meta-data, such as using features like header/footer and N-gram to identify the fragment type. Recently, convolutional neural network (CNN) models have been used to build classification models to achieve this task. However, the number of parameters in CNNs tends to grow exponentially as the number of layers increases. This results in a dramatic increase in training and inference time. In this paper, we propose light-weight file fragment classification models based on depthwise separable CNNs. The evaluation results show that our proposed models provide faster inference time with comparable accuracy as compared to the state-of-art CNN based models. In particular, our models were able to achieve an accuracy of 79\% on the FFT-75 dataset with nearly 100K parameters and 164M FLOPs, which is 4x smaller and 6x faster than the state-of-the-art classifier in the literature.
翻译:在数字取证中,文件碎片分类是完成文件雕刻过程的重要步骤。现有多种技术可在不依赖元数据的情况下识别文件碎片类型,例如利用文件头/尾特征及N-gram等特征进行分类。近年来,卷积神经网络(CNN)模型已被用于构建完成该任务的分类模型。然而,随着网络层数增加,CNN参数数量呈指数级增长,导致训练与推理时间显著增加。本文提出基于深度可分离卷积的轻量级文件碎片分类模型。评估结果表明,与当前最先进的基于CNN的模型相比,我们提出的模型在保持相当分类精度的同时实现了更快的推理速度。具体而言,在FFT-75数据集上,我们的模型仅需约10万个参数和1.64亿次浮点运算即可达到79%的分类准确率,其规模比现有最优分类器小4倍、速度快6倍。