Performance modelling of a deep learning application is essential to improve and quantify the efficiency of the model framework. However, existing performance models are mostly case-specific, with limited capability for the new deep learning frameworks/applications. In this paper, we propose a generic performance model of an application in a distributed environment with a generic expression of the application execution time that considers the influence of both intrinsic factors/operations (e.g. algorithmic parameters/internal operations) and extrinsic scaling factors (e.g. the number of processors, data chunks and batch size). We formulate it as a global optimization problem and solve it using regularization on a cost function and differential evolution algorithm to find the best-fit values of the constants in the generic expression to match the experimentally determined computation time. We have evaluated the proposed model on three deep learning frameworks (i.e., TensorFlow, MXnet, and Pytorch). The experimental results show that the proposed model can provide accurate performance predictions and interpretability. In addition, the proposed work can be applied to any distributed deep neural network without instrumenting the code and provides insight into the factors affecting performance and scalability.
翻译:深度学习应用的性能建模对于提升和量化模型框架效率至关重要。然而,现有性能模型多为特定场景设计,难以适应新兴深度学习框架或应用。本文提出了一种分布式环境下应用的通用性能模型,通过包含内在因素/操作(如算法参数、内部操作)与外在扩展因子(如处理器数量、数据分块及批处理大小)影响的通用表达式,刻画应用执行时间。我们将该问题形式化为全局优化问题,采用代价函数正则化与差分进化算法求解,确定通用表达式中常数的最优匹配值,使之与实验计算时间吻合。我们在三个深度学习框架(即TensorFlow、MXnet和PyTorch)上评估了所提模型。实验表明,该模型能提供准确的性能预测与可解释性。此外,本工作无需修改代码即可应用于任意分布式深度神经网络,并揭示了影响性能与可扩展性的关键因素。