Automated Machine Learning (AutoML) significantly simplifies the deployment of machine learning models by automating tasks from data preprocessing to model selection to ensembling. AutoML systems for tabular data often employ post hoc ensembling, where multiple models are combined to improve predictive accuracy. This typically results in longer inference times, a major limitation in practical deployments. Addressing this, we introduce a hardware-aware ensemble selection approach that integrates inference time into post hoc ensembling. By leveraging an existing framework for ensemble selection with quality diversity optimization, our method evaluates ensemble candidates for their predictive accuracy and hardware efficiency. This dual focus allows for a balanced consideration of accuracy and operational efficiency. Thus, our approach enables practitioners to choose from a Pareto front of accurate and efficient ensembles. Our evaluation using 83 classification datasets shows that our approach sustains competitive accuracy and can significantly improve ensembles' operational efficiency. The results of this study provide a foundation for extending these principles to additional hardware constraints, setting the stage for the development of more resource-efficient AutoML systems.
翻译:自动化机器学习(AutoML)通过将数据预处理、模型选择到集成构建等任务自动化,显著简化了机器学习模型的部署流程。针对表格数据的AutoML系统通常采用事后集成方法,即通过组合多个模型来提升预测精度。但这通常会导致更长的推理时间,成为实际部署中的主要限制。为此,我们提出一种硬件感知的集成选择方法,将推理时间纳入事后集成过程。通过利用现有基于质量多样性优化的集成选择框架,我们的方法从预测精度和硬件效率两个维度评估候选集成方案。这种双重关注实现了精度与运行效率的平衡考量,使实践者能够从准确且高效的帕累托前沿集成方案中进行选择。我们在83个分类数据集上的评估表明,该方法在保持竞争力的预测精度的同时,能显著提升集成方案的运行效率。本研究结果为将这些原则扩展至其他硬件约束奠定了基础,为开发更具资源效率的AutoML系统铺平了道路。