Performance tuning, software/hardware co-design, and job scheduling are among the many tasks that rely on models to predict application performance. We propose and evaluate low-rank tensor decomposition for modeling application performance. We discretize the input and configuration domains of an application using regular grids. Application execution times mapped within grid-cells are averaged and represented by tensor elements. We show that low-rank canonical-polyadic (CP) tensor decomposition is effective in approximating these tensors. We further show that this decomposition enables accurate extrapolation of unobserved regions of an application's parameter space. We then employ tensor completion to optimize a CP decomposition given a sparse set of observed execution times. We consider alternative piecewise/grid-based models and supervised learning models for six applications and demonstrate that CP decomposition optimized using tensor completion offers higher prediction accuracy and memory-efficiency for high-dimensional performance modeling.
翻译:性能调优、软硬件协同设计及作业调度等任务均依赖于预测应用性能的模型。我们提出并评估了用于应用性能建模的低秩张量分解方法。通过规则网格对应用的输入域与配置域进行离散化处理,将网格单元内的应用执行时间映射为平均值并用张量元素表示。研究表明,低秩规范多路分解(CP)在逼近这些张量时具有有效性,且该分解能够实现对应用参数空间中未观测区域的精确外推。我们进一步采用张量补全技术,基于稀疏观测执行时间优化CP分解。针对六种应用,我们对比了基于分段/网格的替代模型及监督学习模型,实验证明经张量补全优化的CP分解在高维性能建模中具有更高的预测精度与内存效率。