Internet services have led to the eruption of network traffic, and machine learning on these Internet data has become an indispensable tool, especially when the application is risk-sensitive. This paper focuses on network traffic classification in the presence of severe class imbalance. Such a distributional trait mostly drifts the optimal decision boundary and results in an unsatisfactory solution. This raises safety concerns in the network traffic field when previous class imbalance methods hardly deal with numerous minority malicious classes. To alleviate these effects, we design a \textit{group \& reweight} strategy for alleviating class imbalance. Inspired by the group distributionally optimization framework, our approach heuristically clusters classes into groups, iteratively updates the non-parametric weights for separate classes, and optimizes the learning model by minimizing reweighted losses. We theoretically interpret the optimization process from a Stackelberg game and perform extensive experiments on typical benchmarks. Results show that our approach can not only suppress the negative effect of class imbalance but also improve the comprehensive performance in prediction.
翻译:互联网服务的兴起导致了网络流量的激增,对这些互联网数据进行机器学习已成为不可或缺的工具,尤其在风险敏感的应用场景中。本文聚焦于存在严重类别不平衡情况下的网络流量分类问题。此类分布特性通常会偏移最优决策边界,导致所得解不尽如人意。当现有类别不平衡方法难以处理大量少数类恶意流量时,这引发了网络流量领域的安全担忧。为缓解这些影响,我们设计了一种用于减轻类别不平衡的“分组与重加权”策略。受分组分布优化框架启发,该方法启发式地将类别聚类为若干组,迭代更新各独立类别的非参数权重,并通过最小化重加权损失来优化学习模型。我们从Stackelberg博弈的角度对优化过程进行了理论阐释,并在典型基准数据集上进行了广泛实验。结果表明,我们的方法不仅能抑制类别不平衡的负面影响,还能提升预测的综合性能。