Training and Hyperparameter Optimization (HPO) of deep learning-based AI models are often compute resource intensive and calls for the use of large-scale distributed resources as well as scalable and resource efficient hyperparameter search algorithms. This work studies the potential of using model performance prediction to aid the HPO process carried out on High Performance Computing systems. In addition, a quantum annealer is used to train the performance predictor and a method is proposed to overcome some of the problems derived from the current limitations in quantum systems as well as to increase the stability of solutions. This allows for achieving results on a quantum machine comparable to those obtained on a classical machine, showing how quantum computers could be integrated within classical machine learning tuning pipelines. Furthermore, results are presented from the development of a containerized benchmark based on an AI-model for collision event reconstruction that allows us to compare and assess the suitability of different hardware accelerators for training deep neural networks.
翻译:基于深度学习的AI模型训练与超参数优化通常计算资源密集,需要大规模分布式资源以及可扩展且资源高效的超参数搜索算法。本研究探讨了利用模型性能预测辅助高性能计算系统上超参数优化过程的潜力。此外,采用量子退火器训练性能预测器,并提出一种方法以克服当前量子系统局限性所带来的部分问题,同时提升解的稳定性。这使得量子计算机上获得的结果可与经典计算机相媲美,展示了量子计算机如何融入经典机器学习调优流程。进一步地,本文展示了基于碰撞事件重构AI模型的容器化基准测试开发成果,该基准测试能够比较并评估不同硬件加速器在训练深度神经网络中的适用性。