Training deep networks and tuning hyperparameters on large datasets is computationally intensive. One of the primary research directions for efficient training is to reduce training costs by selecting well-generalizable subsets of training data. Compared to simple adaptive random subset selection baselines, existing intelligent subset selection approaches are not competitive due to the time-consuming subset selection step, which involves computing model-dependent gradients and feature embeddings and applies greedy maximization of submodular objectives. Our key insight is that removing the reliance on downstream model parameters enables subset selection as a pre-processing step and enables one to train multiple models at no additional cost. In this work, we propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training while enabling superior model convergence and performance by using an easy-to-hard curriculum. Our empirical results indicate that MILO can train models $3\times - 10 \times$ faster and tune hyperparameters $20\times - 75 \times$ faster than full-dataset training or tuning without compromising performance.
翻译:在大规模数据集上训练深度网络及调优超参数需要大量计算资源。高效训练的核心研究方向之一,是通过选择泛化性良好的训练数据子集来降低训练成本。相较于简单的自适应随机子集选择基线方法,现有智能子集选择方法因需要计算与模型相关的梯度及特征嵌入,并应用子模目标的贪心最大化,其子集选择步骤耗时显著,导致竞争力不足。我们的关键洞察在于:消除对下游模型参数的依赖,可将子集选择作为预处理步骤,从而在不增加额外成本的情况下训练多个模型。本文提出MILO——一种模型无关子集选择框架,该框架将子集选择与模型训练解耦,并通过“由易到难”的课程学习策略实现更优的模型收敛与性能。实验结果表明,与全数据集训练或调优相比,MILO可将模型训练速度提升$3\times - 10 \times$倍,超参数调优速度提升$20\times - 75 \times$倍,且不牺牲性能。