In recent years there has been a dramatic increase in the number of malware attacks that use encrypted HTTP traffic for self-propagation or communication. Antivirus software and firewalls typically will not have access to encryption keys, and therefore direct detection of malicious encrypted data is unlikely to succeed. However, previous work has shown that traffic analysis can provide indications of malicious intent, even in cases where the underlying data remains encrypted. In this paper, we apply three machine learning techniques to the problem of distinguishing malicious encrypted HTTP traffic from benign encrypted traffic and obtain results comparable to previous work. We then consider the problem of feature analysis in some detail. Previous work has often relied on human expertise to determine the most useful and informative features in this problem domain. We demonstrate that such feature-related information can be obtained directly from machine learning models themselves. We argue that such a machine learning based approach to feature analysis is preferable, as it is more reliable, and we can, for example, uncover relatively unintuitive interactions between features.
翻译:近年来,利用加密HTTP流量进行自我传播或通信的恶意软件攻击数量急剧增加。反病毒软件和防火墙通常无法获取加密密钥,因此直接检测恶意加密数据难以奏效。然而,先前研究表明,即使底层数据保持加密状态,流量分析仍可提供恶意意图的迹象。本文将对三种机器学习技术应用于区分恶意加密HTTP流量与良性加密流量的问题,并获得与先前研究相当的结果。随后,我们详细探讨了特征分析问题。以往研究往往依赖人类专家经验来确定该问题领域中最具价值的信息特征。我们证明这类特征相关信息可直接从机器学习模型自身获取。我们主张这种基于机器学习的特征分析方法更为可靠,例如能够揭示特征间相对不直观的交互作用,因此更具优势。