Training deep networks and tuning hyperparameters on large datasets is computationally intensive. One of the primary research directions for efficient training is to reduce training costs by selecting well-generalizable subsets of training data. Compared to simple adaptive random subset selection baselines, existing intelligent subset selection approaches are not competitive due to the time-consuming subset selection step, which involves computing model-dependent gradients and feature embeddings and applies greedy maximization of submodular objectives. Our key insight is that removing the reliance on downstream model parameters enables subset selection as a pre-processing step and enables one to train multiple models at no additional cost. In this work, we propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training while enabling superior model convergence and performance by using an easy-to-hard curriculum. Our empirical results indicate that MILO can train models $3\times - 10 \times$ faster and tune hyperparameters $20\times - 75 \times$ faster than full-dataset training or tuning without compromising performance.
翻译:在大规模数据集上训练深度网络和调整超参数计算量巨大。高效训练的核心研究方向之一是通过选择泛化性好的训练数据子集来降低训练成本。与简单的自适应随机子集选择基线相比,现有智能子集选择方法因需计算模型相关的梯度与特征嵌入,并应用子模目标的贪婪最大化,其子集选择步骤耗时过长,因而缺乏竞争力。我们的关键洞察在于:消除对下游模型参数的依赖,可使子集选择作为预处理步骤独立进行,从而能以零额外成本训练多个模型。为此,本文提出MILO——一种模型无关的子集选择框架,通过采用从易到难的课程式策略,将子集选择与模型训练解耦,同时实现更优的模型收敛效果与性能。实验结果表明,与全数据集训练或调参相比,MILO能以$3\times - 10 \times$的提速训练模型,并以$20\times - 75 \times$的增速完成超参数调节,且不牺牲模型性能。