This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional $\pi$-weighted cost-sensitive learning model, the SIC model does not have an external $\pi$-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition considering the (scale-)class-imbalance ratio, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the (scale-)class-imbalance ratio between the training and test datasets. This adaptation is justified by a skewed hyperplane equation, created via linearization of the gradient satisfying $\epsilon$-optimal condition. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms (or is comparable to) $\pi$-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with $51$ two-class and $67$ multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP$1$) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.
翻译:本文提出了一种新的多项式参数化Sigmoid函数——SIGTRON,它是一种带有感知机的扩展非对称Sigmoid函数,并提出了其伴随凸模型——SIGTRON非平衡分类(SIC)模型。该模型采用了一种虚拟的SIGTRON诱导凸损失函数。与传统的$\pi$加权代价敏感学习模型不同,SIC模型在损失函数上没有外部$\pi$权重,但在虚拟SIGTRON诱导损失函数中具有内部参数。因此,当给定的训练数据集在考虑(尺度)类别不平衡比率时接近良好平衡条件时,我们证明了所提出的SIC模型对数据集的变化具有更强的适应性,例如训练集和测试集之间(尺度)类别不平衡比率的不一致。这种适应性通过一个偏斜的超平面方程得到验证,该方程通过满足$\epsilon$最优条件的梯度线性化生成。此外,我们提出了一种基于区间二分线搜索的拟牛顿优化(L-BFGS)框架,用于虚拟凸损失。通过实验,我们在$51$个二分类和$67$个多分类数据集上观察到,所提出的方法在测试分类准确率方面优于(或与)$\pi$加权凸焦点损失和平衡分类器LIBLINEAR(逻辑回归、支持向量机和L2SVM)相当。在二分类问题中,当训练数据集的尺度类别不平衡比率不显著但存在不一致性时,每组SIC模型中对每个数据集取得最佳测试准确率的模型(TOP$1$)优于著名的基于核的分类器LIBSVM(使用RBF核的C-SVC)。