File Fragment Classification using Light-Weight Convolutional Neural Networks

In digital forensics, file fragment classification is an important step toward completing file carving process. There exist several techniques to identify the type of file fragments without relying on meta-data, such as using features like header/footer and N-gram to identify the fragment type. Recently, convolutional neural network (CNN) models have been used to build classification models to achieve this task. However, the number of parameters in CNNs tends to grow exponentially as the number of layers increases. This results in a dramatic increase in training and inference time. In this paper, we propose light-weight file fragment classification models based on depthwise separable CNNs. The evaluation results show that our proposed models provide faster inference time with comparable accuracy as compared to the state-of-art CNN based models. In particular, our models were able to achieve an accuracy of 79\% on the FFT-75 dataset with nearly 100K parameters and 164M FLOPs, which is 4x smaller and 6x faster than the state-of-the-art classifier in the literature.

翻译：在数字取证中，文件碎片分类是完成文件雕刻过程的重要步骤。现有多种技术可在不依赖元数据的情况下识别文件碎片类型，例如利用文件头/尾特征及N-gram等特征进行分类。近年来，卷积神经网络（CNN）模型已被用于构建完成该任务的分类模型。然而，随着网络层数增加，CNN参数数量呈指数级增长，导致训练与推理时间显著增加。本文提出基于深度可分离卷积的轻量级文件碎片分类模型。评估结果表明，与当前最先进的基于CNN的模型相比，我们提出的模型在保持相当分类精度的同时实现了更快的推理速度。具体而言，在FFT-75数据集上，我们的模型仅需约10万个参数和1.64亿次浮点运算即可达到79%的分类准确率，其规模比现有最优分类器小4倍、速度快6倍。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日