Performance Benchmarking of HPC systems is an ongoing effort that seeks to provide information that will allow for increased performance and improve the job schedulers that manage these systems. We develop a benchmarking tool that utilizes machine learning models and gathers performance data on GPU-accelerated nodes while they perform material segmentation analysis. The benchmark uses a ML model that has been converted from Caffe to PyTorch using the MMdnn toolkit and the MINC-2500 dataset. Performance data is gathered on two ERDC DSRC systems, Onyx and Vulcanite. The data reveals that while Vulcanite has faster model times in a large number of benchmarks, and it is also more subject to some environmental factors that can cause performances slower than Onyx. In contrast the model times from Onyx are consistent across benchmarks.
翻译:高性能计算系统的性能基准测试是一项持续性的工作,旨在提供信息以提高系统性能并优化管理这些系统的作业调度器。我们开发了一个利用机器学习模型的基准测试工具,在GPU加速节点进行材料分割分析时收集性能数据。该基准测试使用了一个通过MMdnn工具包从Caffe转换为PyTorch的机器学习模型,并采用MINC-2500数据集。性能数据在两个ERDC DSRC系统(Onyx和Vulcanite)上收集。数据表明,虽然Vulcanite在大量基准测试中模型运行时间更快,但它也更易受某些环境因素影响,可能导致性能慢于Onyx。相比之下,Onyx的模型运行时间在各基准测试中表现一致。