Category imbalance is one of the most popular and important issues in the domain of classification. In this paper, we present a new generalized framework with Adaptive Weight function for soft-margin Weighted SVM (AW-WSVM), which aims to enhance the issue of imbalance and outlier sensitivity in standard support vector machine (SVM) for classifying two-class data. The weight coefficient is introduced into the unconstrained soft-margin support vector machines, and the sample weights are updated before each training. The Adaptive Weight function (AW function) is constructed from the distance between the samples and the decision hyperplane, assigning different weights to each sample. A weight update method is proposed, taking into account the proximity of the support vectors to the decision hyperplane. Before training, the weights of the corresponding samples are initialized according to different categories. Subsequently, the samples close to the decision hyperplane are identified and assigned more weights. At the same time, lower weights are assigned to samples that are far from the decision hyperplane. Furthermore, we also put forward an effective way to eliminate noise. To evaluate the strength of the proposed generalized framework, we conducted experiments on standard datasets and emotion classification datasets with different imbalanced ratios (IR). The experimental results prove that the proposed generalized framework outperforms in terms of accuracy, recall metrics and G-mean, validating the effectiveness of the weighted strategy provided in this paper in enhancing support vector machines.
翻译:类别不平衡是分类领域中最普遍且重要的问题之一。本文提出一种新的基于自适应权重函数的软间隔加权支持向量机广义框架(AW-WSVM),旨在解决标准支持向量机在二分类数据中存在的类别不平衡和离群点敏感性问题。通过将权重系数引入无约束软间隔支持向量机,并在每次训练前更新样本权重,我们利用样本与决策超平面之间的距离构建自适应权重函数(AW函数),为每个样本分配不同权重。同时提出一种权重更新方法,充分考虑支持向量与决策超平面的邻近程度。训练前,根据类别初始化对应样本权重;随后识别靠近决策超平面的样本并赋予更高权重,同时为远离决策超平面的样本分配较低权重。此外,我们还提出一种有效的噪声消除方法。为评估该广义框架的性能,我们在标准数据集和不同不平衡比率(IR)的情感分类数据集上进行了实验。结果表明,所提出的广义框架在准确率、召回率和G-mean指标上均表现优越,验证了本文提供的加权策略在增强支持向量机方面的有效性。