In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i) computational cost; (ii) memory overhead. In particular, the memory issue precludes a large volume of prior algorithms that are based on batch optimization technique. To remedy the problem, we propose to learn sparse models such as Lasso in an online manner where in each iteration, only one randomly chosen sample is revealed to update a sparse iterate. Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient. Perhaps amazingly, we find that with the same parameter, sparsity promoted by batch methods is not preserved in online fashion. We analyze such interesting phenomenon and illustrate some effective variants including mini-batch methods and a hard thresholding based stochastic gradient algorithm. Extensive experiments are carried out on a public dataset which supports our findings and algorithms.
翻译:在本工作中,我们考虑在大规模场景下学习稀疏模型,其中样本数量和特征维度可增长至百万或数十亿级别。在此类挑战性场景中,会立即出现两个问题:(i) 计算成本;(ii) 内存开销。特别是,内存问题使得大量基于批处理优化技术的现有算法无法适用。为解决该问题,我们提出以在线方式学习Lasso等稀疏模型,其中每次迭代仅随机选取一个样本用于更新稀疏迭代解。因此,内存开销独立于样本规模,且单样本梯度计算效率较高。令人惊讶的是,我们发现采用相同参数时,批处理方法所促进的稀疏性在在线方式中无法保持。我们分析了这一有趣现象,并阐述了一些有效变体,包括小批量方法以及基于硬阈值的随机梯度算法。在公开数据集上进行了大量实验,实验结果支持我们的发现与算法。