AutoML platforms have numerous options for the algorithms to try for each step of the analysis, i.e., different possible algorithms for imputation, transformations, feature selection, and modelling. Finding the optimal combination of algorithms and hyper-parameter values is computationally expensive, as the number of combinations to explore leads to an exponential explosion of the space. In this paper, we present the Sequential Hyper-parameter Space Reduction (SHSR) algorithm that reduces the space for an AutoML tool with negligible drop in its predictive performance. SHSR is a meta-level learning algorithm that analyzes past runs of an AutoML tool on several datasets and learns which hyper-parameter values to filter out from consideration on a new dataset to analyze. SHSR is evaluated on 284 classification and 375 regression problems, showing an approximate 30% reduction in execution time with a performance drop of less than 0.1%.
翻译:AutoML平台为分析流程的每个步骤提供了众多算法选项,即插补、变换、特征选择和建模可用的不同算法。寻找算法与超参数值的最优组合在计算上是昂贵的,因为需要探索的组合数量会导致空间的指数级爆炸。本文提出了一种序列化超参数空间缩减(SHSR)算法,该算法能在预测性能几乎无损失的情况下缩减AutoML工具的搜索空间。SHSR是一种元层次学习算法,通过分析AutoML工具在多个数据集上的历史运行结果,学习在新数据集上应过滤掉哪些超参数值。我们在284个分类问题和375个回归问题上评估了SHSR,结果表明其执行时间减少约30%,而性能下降低于0.1%。