Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Alternative approaches all involve approximations of some kind, including the use of surrogates for the 0-1 loss (for example, the hinge or logistic loss) or approximate combinatorial search, none of which can be guaranteed to solve the problem exactly. Finding efficient algorithms to obtain an exact i.e. globally optimal solution for the 0-1 loss linear classification problem with fixed dimension, remains an open problem. In research we report here, we detail the construction of a new algorithm, incremental cell enumeration (ICE), that can solve the 0-1 loss classification problem exactly in polynomial time. To our knowledge, this is the first, rigorously-proven polynomial time algorithm for this long-standing problem.
翻译:求解线性分类问题的算法历史悠久,至少可追溯至1936年的线性判别分析。对于线性可分数据,许多算法能够高效获得相应0-1损失分类问题的精确解;但对于线性不可分数据,已有研究表明该问题在完全一般化情况下是NP难的。现有替代方法均涉及某种形式的近似,包括使用0-1损失的替代函数(例如铰链损失或逻辑损失)或近似组合搜索,这些方法均无法保证精确求解。在固定维度下寻找高效算法以获得0-1损失线性分类问题的精确(即全局最优)解,仍是一个开放问题。在我们此处报告的研究中,我们详细阐述了一种新算法——增量单元枚举法(ICE)的构建,该算法能够在多项式时间内精确求解0-1损失分类问题。据我们所知,这是针对这一长期存在的难题首个得到严格证明的多项式时间算法。