Internet services have led to the eruption of network traffic, and machine learning on these Internet data has become an indispensable tool, especially when the application is risk-sensitive. This paper focuses on network traffic classification in the presence of class imbalance, which fundamentally and ubiquitously exists in Internet data analysis. This existence of class imbalance mostly drifts the optimal decision boundary and results in a less optimal solution. This brings severe safety concerns in the network traffic field when pattern recognition is challenging with numerous minority malicious classes. To alleviate these effects, we design a \textit{group \& reweight} strategy for alleviating the class imbalance. Inspired by the group distributionally optimization framework, our approach heuristically clusters classes into groups, iteratively updates the non-parametric weights for separate classes and optimizes the learning model by minimizing reweighted losses. We theoretically interpret the optimization process from a Stackelberg game and perform extensive experiments on typical benchmarks. Results show that our approach can not only suppress the negative effect of class imbalance but also improve the comprehensive performance in prediction.
翻译:互联网服务的兴起导致了网络流量的激增,对这些互联网数据进行机器学习已成为不可或缺的工具,尤其在风险敏感型应用中。本文聚焦于存在类别不平衡情况下的网络流量分类问题,该问题在互联网数据分析中普遍存在且具有根本性影响。类别不平衡的存在通常会偏移最优决策边界,导致解的非最优性。当面对大量少数类恶意流量时,这种偏移会使模式识别面临挑战,从而在网络流量领域引发严重的安全隐患。为缓解这些影响,我们设计了一种\textit{分组与重加权}策略来减轻类别不平衡。受分组分布式优化框架启发,我们的方法通过启发式聚类将类别分组,迭代更新各类别的非参数权重,并通过最小化重加权损失来优化学习模型。我们从Stackelberg博弈的角度对优化过程进行了理论阐释,并在典型基准数据集上进行了大量实验。结果表明,我们的方法不仅能抑制类别不平衡的负面影响,还能提升预测的综合性能。