The ability to accurately predict deep neural network (DNN) inference performance metrics, such as latency, power, and memory footprint, for an arbitrary DNN on a target hardware platform is essential to the design of DNN based models. This ability is critical for the (manual or automatic) design, optimization, and deployment of practical DNNs for a specific hardware deployment platform. Unfortunately, these metrics are slow to evaluate using simulators (where available) and typically require measurement on the target hardware. This work describes PerfSAGE, a novel graph neural network (GNN) that predicts inference latency, energy, and memory footprint on an arbitrary DNN TFlite graph (TFL, 2017). In contrast, previously published performance predictors can only predict latency and are restricted to pre-defined construction rules or search spaces. This paper also describes the EdgeDLPerf dataset of 134,912 DNNs randomly sampled from four task search spaces and annotated with inference performance metrics from three edge hardware platforms. Using this dataset, we train PerfSAGE and provide experimental results that demonstrate state-of-the-art prediction accuracy with a Mean Absolute Percentage Error of <5% across all targets and model search spaces. These results: (1) Outperform previous state-of-art GNN-based predictors (Dudziak et al., 2020), (2) Accurately predict performance on accelerators (a shortfall of non-GNN-based predictors (Zhang et al., 2021)), and (3) Demonstrate predictions on arbitrary input graphs without modifications to the feature extractor.
翻译:准确预测深度神经网络(DNN)在目标硬件平台上的推理性能指标(如延迟、功耗和内存占用)对于DNN模型设计至关重要。这一能力对于特定硬件部署平台上实用DNN的(手动或自动)设计、优化和部署尤为关键。然而,使用模拟器(若可用)评估这些指标速度缓慢,且通常需在目标硬件上进行实测。本文提出PerfSAGE——一种新型图神经网络(GNN),能够预测任意DNN TFlite图(TFL,2017)的推理延迟、能耗和内存占用。与此前发表的性能预测器相比,现有方法仅能预测延迟,且局限于预定义的构造规则或搜索空间。本文还构建了EdgeDLPerf数据集,包含从四个任务搜索空间中随机采样的134,912个DNN,并附有来自三个边缘硬件平台的推理性能标注。通过该数据集训练PerfSAGE,实验结果表明其在所有目标和模型搜索空间上实现了平均绝对百分比误差(MAPE)低于5%的先进预测精度。这些成果:(1)超越此前基于GNN的预测器(Dudziak等人,2020年);(2)准确预测加速器上的性能(非GNN预测器(Zhang等人,2021年)的短板);(3)无需修改特征提取器即可对任意输入图进行预测。