Support vector machines (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data, however, they can be slow, especially for large problems. Piecewise linear classifiers are similarly versatile, yet have the additional advantages of simplicity, ease of interpretation and, if the number of component linear classifiers is not too large, speed. Here we show how a simple, piecewise linear classifier can be trained from a kernel-based classifier in order to improve the classification speed. The method works by finding the root of the difference in conditional probabilities between pairs of opposite classes to build up a representation of the decision boundary. When tested on 17 different datasets, it succeeded in improving the classification speed of a SVM for 12 of them by up to two orders-of-magnitude. Of these, two were less accurate than a simple, linear classifier. The method is best suited to problems with continuum features data and smooth probability functions. Because the component linear classifiers are built up individually from an existing classifier, rather than through a simultaneous optimization procedure, the classifier is also fast to train.
翻译:支持向量机(SVM)及其他核技术代表了一类强大的统计分类方法,具有高准确率和广泛的适用性。然而,由于它们使用全部或大部分训练数据,在处理大规模问题时可能会变得缓慢。分段线性分类器同样具有通用性,且额外具备简洁、易于解释的优势,并且若其中线性分类器组件数量不大,其分类速度也较快。本文展示如何从基于核的分类器中训练出一个简单的分段线性分类器,以提升分类速度。该方法通过寻找成对对立类别间条件概率差异的根来构建决策边界的表示。在17个不同数据集上的测试中,该方法成功提升了其中12个数据集上SVM的分类速度,加速幅度可达两个数量级。在这些数据集中,有两个数据集上的准确率低于简单的线性分类器。该方法最适合处理具有连续特征数据和平滑概率函数的问题。由于各线性分类器组件是基于现有分类器逐个构建的,而非通过同步优化过程生成,因此该分类器的训练速度也很快。