Big-data applications often involve a vast number of observations and features, creating new challenges for variable selection and parameter estimation. This paper presents a novel technique called ``slow kill,'' which utilizes nonconvex constrained optimization, adaptive $\ell_2$-shrinkage, and increasing learning rates. The fact that the problem size can decrease during the slow kill iterations makes it particularly effective for large-scale variable screening. The interaction between statistics and optimization provides valuable insights into controlling quantiles, stepsize, and shrinkage parameters in order to relax the regularity conditions required to achieve the desired level of statistical accuracy. Experimental results on real and synthetic data show that slow kill outperforms state-of-the-art algorithms in various situations while being computationally efficient for large-scale data.
翻译:大数据应用通常涉及海量观测和特征,给变量选择和参数估计带来了新的挑战。本文提出了一种名为“慢杀”(slow kill)的新技术,该技术利用非凸约束优化、自适应$\ell_2$收缩以及递增的学习率。由于问题规模在慢杀迭代过程中可以逐步减小,这使得该方法特别适用于大规模变量筛选。统计学与优化之间的相互作用为控制分位数、步长和收缩参数提供了重要见解,从而放松了达到所需统计精度所需的常规条件。在真实和合成数据上的实验结果表明,慢杀方法在各种场景下均优于现有先进算法,同时在大规模数据上具有计算高效性。