Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodology to benchmark creation and usage by simulators and emulators for future system co-design. We propose Chakra, an open graph schema for standardizing workload specification capturing key operations and dependencies, also known as Execution Trace (ET). In addition, we propose a complementary set of tools/capabilities to enable collection, generation, and adoption of Chakra ETs by a wide range of simulators, emulators, and benchmarks. For instance, we use generative AI models to learn latent statistical properties across thousands of Chakra ETs and use these models to synthesize Chakra ETs. These synthetic ETs can obfuscate key proprietary information and also target future what-if scenarios. As an example, we demonstrate an end-to-end proof-of-concept that converts PyTorch ETs to Chakra ETs and uses this to drive an open-source training system simulator (ASTRA-sim). Our end-goal is to build a vibrant industry-wide ecosystem of agile benchmarks and tools to drive future AI system co-design.
翻译:基准测试与协同设计对于推动机器学习模型、软件及下一代硬件优化与创新至关重要。全工作负载基准测试(如MLPerf)在实现不同软硬件堆栈间的公平比较方面发挥着关键作用,尤其是在系统完成设计与部署之后。然而,人工智能创新的步伐要求采用更敏捷的方法来创建基准测试,并供模拟器与仿真器用于未来系统协同设计。我们提出查克拉(Chakra)——一种开放图模式,用于标准化捕获关键操作与依赖关系的工作负载规范(即执行踪迹,ET)。此外,我们还提出一套互补的工具与能力,以支持查克拉ET的收集、生成与采纳,使其适用于广泛的模拟器、仿真器与基准测试。例如,我们利用生成式AI模型学习数千个查克拉ET中的潜在统计属性,并基于这些模型合成新的查克拉ET。这些合成ET既能隐藏关键专有信息,又能针对未来假设场景进行探索。我们通过一个端到端概念验证系统展示其可行性:将PyTorch ET转换为查克拉ET,并驱动开源训练系统模拟器(ASTRA-sim)。我们的最终目标是构建一个蓬勃发展的行业级敏捷基准测试与工具生态系统,以推动未来人工智能系统协同设计。