The support vector machines (SVM) is a powerful classifier used for binary classification to improve the prediction accuracy. However, the non-differentiability of the SVM hinge loss function can lead to computational difficulties in high dimensional settings. To overcome this problem, we rely on Bernstein polynomial and propose a new smoothed version of the SVM hinge loss called the Bernstein support vector machine (BernSVM), which is suitable for the high dimension $p >> n$ regime. As the BernSVM objective loss function is of the class $C^2$, we propose two efficient algorithms for computing the solution of the penalized BernSVM. The first algorithm is based on coordinate descent with maximization-majorization (MM) principle and the second one is IRLS-type algorithm (iterative re-weighted least squares). Under standard assumptions, we derive a cone condition and a restricted strong convexity to establish an upper bound for the weighted Lasso BernSVM estimator. Using a local linear approximation, we extend the latter result to penalized BernSVM with non convex penalties SCAD and MCP. Our bound holds with high probability and achieves a rate of order $\sqrt{s\log(p)/n}$, where $s$ is the number of active features. Simulation studies are considered to illustrate the prediction accuracy of BernSVM to its competitors and also to compare the performance of the two algorithms in terms of computational timing and error estimation. The use of the proposed method is illustrated through analysis of three large-scale real data examples.
翻译:支持向量机(SVM)是一种用于二分类的强分类器,可提高预测精度。然而,SVM合页损失函数的非可微性在高维场景下可能导致计算困难。为克服此问题,我们借助伯恩斯坦多项式提出一种新的平滑化SVM合页损失函数,称为伯恩斯坦支持向量机(BernSVM),适用于高维 $p >> n$ 情况。由于BernSVM目标损失函数属于$C^2$类,我们设计了两种高效算法求解惩罚性BernSVM的解。第一种算法基于坐标下降法与最大-最小化(MM)原则,第二种为IRLS型算法(迭代重加权最小二乘法)。在标准假设下,我们推导了锥条件与限制强凸性,以建立加权Lasso BernSVM估计量的上界。通过局部线性近似,我们将上述结果推广至具有非凸惩罚项SCAD和MCP的惩罚性BernSVM。该界以高概率成立,并达到$\sqrt{s\log(p)/n}$的量级,其中$s$为有效特征数。通过模拟研究对比BernSVM及其竞争方法的预测精度,并在计算时间与误差估计方面评估两种算法的性能。最后通过三个大规模真实数据实例的应用验证所提方法的实用性。