Trojans are one of the most threatening network attacks currently. HTTP-based Trojan, in particular, accounts for a considerable proportion of them. Moreover, as the network environment becomes more complex, HTTP-based Trojan is more concealed than others. At present, many intrusion detection systems (IDSs) are increasingly difficult to effectively detect such Trojan traffic due to the inherent shortcomings of the methods used and the backwardness of training data. Classical anomaly detection and traditional machine learning-based (TML-based) anomaly detection are highly dependent on expert knowledge to extract features artificially, which is difficult to implement in HTTP-based Trojan traffic detection. Deep learning-based (DL-based) anomaly detection has been locally applied to IDSs, but it cannot be transplanted to HTTP-based Trojan traffic detection directly. To solve this problem, in this paper, we propose a neural network detection model (HSTF-Model) based on hierarchical spatiotemporal features of traffic. Meanwhile, we combine deep learning algorithms with expert knowledge through feature encoders and statistical characteristics to improve the self-learning ability of the model. Experiments indicate that F1 of HSTF-Model can reach 99.4% in real traffic. In addition, we present a dataset BTHT consisting of HTTP-based benign and Trojan traffic to facilitate related research in the field.
翻译:木马是目前最具威胁性的网络攻击之一,其中基于HTTP的木马占据了相当大的比例。此外,随着网络环境日益复杂,基于HTTP的木马比其他类型木马更具隐蔽性。目前,由于所采用方法的固有缺陷以及训练数据的滞后性,许多入侵检测系统(IDS)已越来越难以有效检测此类木马流量。经典异常检测和基于传统机器学习的异常检测高度依赖专家知识进行人工特征提取,这在基于HTTP的木马流量检测中难以实现。基于深度学习的异常检测虽已在IDS中局部应用,但无法直接移植到基于HTTP的木马流量检测中。为解决这一问题,本文提出一种基于流量层次化时空特征的神经网络检测模型(HSTF-Model)。同时,我们通过特征编码器和统计特征将深度学习算法与专家知识相结合,以提高模型的自学习能力。实验表明,HSTF-Model在真实流量中的F1值可达99.4%。此外,我们构建了一个包含HTTP良性流量与木马流量的数据集BTHT,以促进该领域的相关研究。