In recent years, considerable attention has been devoted to the regularization models due to the presence of high-dimensional data in scientific research. Sparse support vector machine (SVM) are useful tools in high-dimensional data analysis, and they have been widely used in the area of econometrics. Nevertheless, the non-smoothness of objective functions and constraints present computational challenges for many existing solvers in the presence of ultra-high dimensional covariates. In this paper, we design efficient and parallelizable algorithms for solving sparse SVM problems with high dimensional data through feature space split. The proposed algorithm is based on the alternating direction method of multiplier (ADMM). We establish the rate of convergence of the proposed ADMM method and compare it with existing solvers in various high and ultra-high dimensional settings. The compatibility of the proposed algorithm with parallel computing can further alleviate the storage and scalability limitations of a single machine in large-scale data processing.
翻译:近年来,由于科学研究中高维数据的出现,正则化模型受到了广泛关注。稀疏支持向量机(SVM)是高维数据分析中的有用工具,已在计量经济学领域得到广泛应用。然而,在超高维协变量存在的情况下,目标函数和约束的非光滑性给许多现有求解器带来了计算挑战。本文通过特征空间分割,设计了高效且可并行化的算法来解决高维数据下的稀疏SVM问题。所提出的算法基于交替方向乘子法(ADMM)。我们建立了所提出的ADMM方法的收敛速率,并在各种高维和超高维设定下将其与现有求解器进行比较。该算法与并行计算的兼容性可进一步缓解单机在大规模数据处理中的存储和可扩展性限制。