The training or fine-tuning of machine learning, vision, and language models is often implemented as a pipeline: a sequence of stages encompassing data preparation, model training and evaluation. In this paper, we exploit pipeline structures to reduce the cost of hyperparameter tuning for model training/fine-tuning, which is particularly valuable for language models given their high costs in GPU-days. We propose a "memoization-aware" Bayesian Optimization (BO) algorithm, EEIPU, that works in tandem with a pipeline caching system, allowing it to evaluate significantly more hyperparameter candidates per GPU-day than other tuning algorithms. The result is better-quality hyperparameters in the same amount of search time, or equivalently, reduced search time to reach the same hyperparameter quality. In our benchmarks on machine learning (model ensembles), vision (convolutional architecture) and language (T5 architecture) pipelines, we compare EEIPU against recent BO algorithms: EEIPU produces an average of $103\%$ more hyperparameter candidates (within the same budget), and increases the validation metric by an average of $108\%$ more than other algorithms (where the increase is measured starting from the end of warm-up iterations).
翻译:机器学习、视觉与语言模型的训练或微调通常以流程化方式实现:包含数据准备、模型训练与评估的序列化阶段。本文通过利用流程结构来降低模型训练/微调中的超参数调优成本,这对计算成本高达数百GPU-日的语言模型尤为重要。我们提出一种“记忆化感知”的贝叶斯优化算法EEIPU,该算法与流程缓存系统协同工作,使其在单位GPU-日内能够评估的超参数候选方案数量显著超越其他调优算法。其效果是在相同搜索时间内获得更高质量的超参数,或在达到相同超参数质量时缩短搜索时间。我们在机器学习(模型集成)、视觉(卷积架构)与语言(T5架构)流程的基准测试中,将EEIPU与近期贝叶斯优化算法进行对比:EEIPU在相同计算预算下平均多产生$103\%$的超参数候选方案,并将验证指标平均提升$108\%$(提升幅度从预热迭代结束后开始计算)。