The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of datasets. While promising research efforts focus on reducing the size of models, the other half of the equation remains fairly mysterious. Indeed, it is surprising that the standard approach to training remains to iterate over and over, uniformly sampling the training dataset. In this paper we explore a series of alternative training paradigms that leverage insights from hard-data-mining and dropout, simple enough to implement and use that can become the new training standard. The proposed Progressive Data Dropout reduces the number of effective epochs to as little as 12.4% of the baseline. This savings actually do not come at any cost for accuracy. Surprisingly, the proposed method improves accuracy by up to 4.82%. Our approach requires no changes to model architecture or optimizer, and can be applied across standard training pipelines, thus posing an excellent opportunity for wide adoption. Code can be found here: https://github.com/bazyagami/LearningWithRevision
翻译:机器学习领域的成功历来依赖于大型数据集的训练。尽管这种方法行之有效,但这一趋势带来了巨大的成本。这源于两个深度交织的因素:模型规模与数据集规模。虽然前景广阔的研究工作致力于缩小模型规模,但问题的另一半仍相当神秘。事实上,令人惊讶的是,标准的训练方法仍然是反复均匀采样训练数据集进行迭代。本文探索了一系列替代训练范式,这些范式借鉴了困难数据挖掘与丢弃法的核心思想,其实现和使用足够简单,有望成为新的训练标准。所提出的渐进式数据丢弃方法将有效训练轮次减少至基线方法的12.4%。这种效率提升实际上并未以精度损失为代价。令人惊讶的是,该方法甚至能将精度提升高达4.82%。我们的方法无需改变模型架构或优化器,可应用于标准训练流程,因此具备广泛采用的绝佳潜力。代码可见:https://github.com/bazyagami/LearningWithRevision