State-of-art NPUs are typically architected as a self-contained sub-system with multiple heterogeneous hardware computing modules, and a dataflow-driven programming model. There lacks well-established methodology and tools in the industry to evaluate and compare the performance of NPUs from different architectures. We present an event-based performance modeling framework, VPU-EM, targeting scalable performance evaluation of modern NPUs across diversified AI workloads. The framework adopts high-level event-based system-simulation methodology to abstract away design details for speed, while maintaining hardware pipelining, concurrency and interaction with software task scheduling. It is natively developed in Python and built to interface directly with AI frameworks such as Tensorflow, PyTorch, ONNX and OpenVINO, linking various in-house NPU graph compilers to achieve optimized full model performance. Furthermore, VPU-EM also provides the capability to model power characteristics of NPU in Power-EM mode to enable joint performance/power analysis. Using VPU-EM, we conduct performance/power analysis of models from representative neural network architecture. We demonstrate that even though this framework is developed for Intel VPU, an Intel in-house NPU IP technology, the methodology can be generalized for analysis of modern NPUs.
翻译:当前最先进的NPU通常被设计为包含多个异构硬件计算模块的自包含子系统,并采用数据流驱动的编程模型。业界缺乏成熟的方法与工具来评估和比较不同架构NPU的性能。本文提出事件驱动的性能建模框架VPU-EM,旨在实现现代NPU在多样化AI工作负载下的可扩展性能评估。该框架采用高层级事件驱动系统仿真方法学,在保持硬件流水线、并发性及与软件任务调度交互能力的同时,通过抽象设计细节提升仿真速度。框架原生基于Python开发,可直接对接Tensorflow、PyTorch、ONNX和OpenVINO等AI框架,通过集成多种自研NPU图编译器实现优化的全模型性能评估。此外,VPU-EM还提供Power-EM模式以建模NPU功耗特性,支持性能/功耗联合分析。我们利用VPU-EM对代表性神经网络架构模型进行性能/功耗分析,结果表明,尽管该框架专为Intel VPU(Intel自研NPU IP技术)开发,但其方法学可推广至现代NPU的通用分析场景。