Accurate and timely detection of cyber threats is critical to keeping our online economy and data safe. A key technique in early detection is the classification of unusual patterns of network behaviour, often hidden as low-frequency events within complex time-series packet flows. One of the ways in which such anomalies can be detected is to analyse the information entropy of the payload within individual packets, since changes in entropy can often indicate suspicious activity - such as whether session encryption has been compromised, or whether a plaintext channel has been co-opted as a covert channel. To decide whether activity is anomalous we need to compare real-time entropy values with baseline values, and while the analysis of entropy in packet data is not particularly new, to the best of our knowledge there are no published baselines for payload entropy across common network services. We offer two contributions: 1) We analyse several large packet datasets to establish baseline payload information entropy values for common network services, 2) We describe an efficient method for engineering entropy metrics when performing flow recovery from live or offline packet data, which can be expressed within feature subsets for subsequent analysis and machine learning applications.
翻译:及时准确地检测网络威胁对于保障在线经济和数据安全至关重要。早期检测的一项关键技术是识别异常网络行为模式,这些模式通常以低频事件的形式隐藏在复杂的时间序列数据包流中。检测此类异常的方法之一是分析单个数据包内载荷的信息熵,因为熵的变化往往能指示可疑活动——例如会话加密是否已被破坏,或明文信道是否已被用作隐蔽信道。要判断活动是否异常,需要将实时熵值与基线值进行比较。尽管对数据包数据进行熵分析并非全新课题,但据我们所知,目前尚无针对常见网络服务中载荷熵的公开基线。我们提出两项贡献:1)通过分析多个大规模数据包数据集,建立了常见网络服务的基线载荷信息熵值;2)描述了一种在从在线或离线数据包数据进行流恢复时高效构建熵指标的方法,该指标可在特征子集中表达,用于后续分析与机器学习应用。