Support vector machines (SVMs) are well-studied supervised learning models for binary classification. In many applications, large amounts of samples can be cheaply and easily obtained. What is often a costly and error-prone process is to manually label these instances. Semi-supervised support vector machines (S3VMs) extend the well-known SVM classifiers to the semi-supervised approach, aiming at maximizing the margin between samples in the presence of unlabeled data. By leveraging both labeled and unlabeled data, S3VMs attempt to achieve better accuracy and robustness compared to traditional SVMs. Unfortunately, the resulting optimization problem is non-convex and hence difficult to solve exactly. In this paper, we present a new branch-and-cut approach for S3VMs using semidefinite programming (SDP) relaxations. We apply optimality-based bound tightening to bound the feasible set. Box constraints allow us to include valid inequalities, strengthening the lower bound. The resulting SDP relaxation provides bounds significantly stronger than the ones available in the literature. For the upper bound, instead, we define a local search exploiting the solution of the SDP relaxation. Computational results highlight the efficiency of the algorithm, showing its capability to solve instances with a number of data points 10 times larger than the ones solved in the literature.
翻译:支持向量机(SVM)是用于二分类问题的经典监督学习模型。在许多应用中,大量样本可以低成本且容易地获取,而人工标注这些样本通常成本高昂且容易出错。半监督支持向量机(S3VM)将广为人知的SVM分类器扩展至半监督方法,旨在有未标注数据存在时最大化样本间的分类间隔。通过同时利用标注和未标注数据,S3VM相比传统SVM能够实现更高的准确性和鲁棒性。然而,由此产生的优化问题是非凸的,因此难以精确求解。本文提出了一种基于半定规划(SDP)松弛的S3VM新型分支切割方法。我们通过基于最优性的边界紧缩来约束可行域,并利用盒式约束引入有效不等式以强化下界。由此得到的SDP松弛提供了显著强于现有文献中的下界。对于上界,我们则定义了一种利用SDP松弛解的局部搜索策略。计算结果表明了该算法的高效性,其能够解决的数据规模比现有文献中的求解规模大10倍。