The sheer volume of data has been generated from the fields of computer vision, medical imageology, astronomy, web information tracking, etc., which hampers the implementation of various statistical algorithms. An efficient and popular method to reduce the computation burden is subsampling. Previous studies focused on subsampling algorithms for non-regularized regression such as ordinary least square regression and logistic regression. In this article, we introduce a flexible and efficient subsampling algorithm based on A-optimality for Elastic-net regression. Theoretical results are given describing the statistical properties of the proposed algorithm. Four numerical examples are given to examine the promising empirical characteristics of the technique. Finally, the algorithm is applied in Blog and 2D-CT slice datasets in reality and has shown a significant lead over the traditional leveraging subsampling method.
翻译:计算机视觉、医学影像学、天文学、网络信息追踪等领域产生了海量数据,这阻碍了各种统计算法的实施。子抽样是一种高效且常用的降低计算负担的方法。以往的研究主要关注非正则化回归(如普通最小二乘回归和逻辑回归)的子抽样算法。本文基于A-最优性准则,提出了一种灵活高效的Elastic-net回归子抽样算法。我们给出了该算法统计特性的理论结果。通过四个数值算例验证了该技术具有良好实证特征。最后,将算法应用于实际中的博客和二维CT切片数据集,结果表明其显著优于传统的杠杆子抽样方法。