An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification

This article presents a new polynomial parameterized sigmoid called SIGTRON, which is an extended asymmetric sigmoid with Perceptron, and its companion convex model called SIGTRON-imbalanced classification (SIC) model that employs a virtual SIGTRON-induced convex loss function. In contrast to the conventional $\pi$-weighted cost-sensitive learning model, the SIC model does not have an external $\pi$-weight on the loss function but has internal parameters in the virtual SIGTRON-induced loss function. As a consequence, when the given training dataset is close to the well-balanced condition considering the (scale-)class-imbalance ratio, we show that the proposed SIC model is more adaptive to variations of the dataset, such as the inconsistency of the (scale-)class-imbalance ratio between the training and test datasets. This adaptation is justified by a skewed hyperplane equation, created via linearization of the gradient satisfying $\epsilon$-optimal condition. Additionally, we present a quasi-Newton optimization(L-BFGS) framework for the virtual convex loss by developing an interval-based bisection line search. Empirically, we have observed that the proposed approach outperforms (or is comparable to) $\pi$-weighted convex focal loss and balanced classifier LIBLINEAR(logistic regression, SVM, and L2SVM) in terms of test classification accuracy with $51$ two-class and $67$ multi-class datasets. In binary classification problems, where the scale-class-imbalance ratio of the training dataset is not significant but the inconsistency exists, a group of SIC models with the best test accuracy for each dataset (TOP$1$) outperforms LIBSVM(C-SVC with RBF kernel), a well-known kernel-based classifier.

翻译：本文提出了一种新的多项式参数化Sigmoid函数——SIGTRON，它是一种带有感知机的扩展非对称Sigmoid函数，并提出了其伴随凸模型——SIGTRON非平衡分类（SIC）模型。该模型采用了一种虚拟的SIGTRON诱导凸损失函数。与传统的$\pi$加权代价敏感学习模型不同，SIC模型在损失函数上没有外部$\pi$权重，但在虚拟SIGTRON诱导损失函数中具有内部参数。因此，当给定的训练数据集在考虑（尺度）类别不平衡比率时接近良好平衡条件时，我们证明了所提出的SIC模型对数据集的变化具有更强的适应性，例如训练集和测试集之间（尺度）类别不平衡比率的不一致。这种适应性通过一个偏斜的超平面方程得到验证，该方程通过满足$\epsilon$最优条件的梯度线性化生成。此外，我们提出了一种基于区间二分线搜索的拟牛顿优化（L-BFGS）框架，用于虚拟凸损失。通过实验，我们在$51$个二分类和$67$个多分类数据集上观察到，所提出的方法在测试分类准确率方面优于（或与）$\pi$加权凸焦点损失和平衡分类器LIBLINEAR（逻辑回归、支持向量机和L2SVM）相当。在二分类问题中，当训练数据集的尺度类别不平衡比率不显著但存在不一致性时，每组SIC模型中对每个数据集取得最佳测试准确率的模型（TOP$1$）优于著名的基于核的分类器LIBSVM（使用RBF核的C-SVC）。