Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticated workflows. Specifically, we find that the intrinsic Dual Dynamicity of Orchestrated AI Workflows, namely dynamic execution times and frequencies of Task Blocks, can be effectively represented using the Orchestrated Workflow Graph. Furthermore, the intrinsic Dual Dynamicity poses challenges to existing spatial architecture, namely Indiscriminate Resource Allocation, Reactive Load Rebalancing, and Contagious PEA Idleness. To overcome these challenges, we present Octopus, a scale-out spatial architecture and a suite of advanced scheduling strategies optimized for executing Orchestrated AI Workflows, such as the Discriminate Dual-Scheduling Mechanism, Adaptive TBU Scheduling Strategy, and Proactive Cluster Scheduling Strategy. Our evaluations demonstrate that Octopus significantly outperforms traditional architectures in handling the dynamic demands of Orchestrated AI Workflows, and possesses robust scalability in large scale hardware such as wafer-scale chip.
翻译:鉴于AI应用日益复杂,传统空间架构常显不足。我们的分析揭示了一种由相互关联、多层面任务构成的模式,这些任务同时涵盖AI与通用计算过程。为此,我们提出了"编排式AI工作流"这一概念,该方法将各类任务与逻辑驱动决策整合为动态、复杂的工作流。具体而言,我们发现编排式AI工作流所固有的双重动态性——即任务块的动态执行时间与动态执行频率——可通过编排式工作流图进行有效表征。此外,这种固有双重动态性对现有空间架构提出了挑战,主要表现为无差别资源分配、被动式负载再平衡及传染性处理单元阵列闲置。为应对这些挑战,我们提出了Octopus系统:一种面向扩展型空间架构及配套高级调度策略的解决方案,其专为执行编排式AI工作流而优化,包括差异化双调度机制、自适应任务块单元调度策略及主动式集群调度策略。实验评估表明,Octopus在处理编排式AI工作流的动态需求方面显著优于传统架构,并在晶圆级芯片等大规模硬件上展现出强大的可扩展性。