Edge and mobile platforms for augmented and virtual reality, collectively referred to as extended reality (XR) must deliver deterministic ultra-low-latency performance under stringent power and area constraints. However, the diversity of XR workloads is rapidly increasing, characterized by heterogeneous operator types and complex dataflow structures. This trend poses significant challenges to conventional accelerator architectures centered around convolutional neural networks (CNNs), resulting in diminishing returns for traditional compute-centric optimization strategies. Despite the importance of this problem, a systematic architectural understanding of the full XR pipeline remains lacking. In this paper, we present an architectural classification of XR workloads using a cross-layer methodology that integrates model-based high-level design space exploration (DSE) with empirical profiling on commercial GPU and CPU hardware. By analyzing a representative set of workloads spanning 12 distinct XR kernels, we distill their complex architectural characteristics into a small set of cross-layer workload archetypes (e.g., capacity-limited and overhead-sensitive). Building on these archetypes, we further extract key architectural insights and provide actionable design guidelines for next-generation XR SoCs. Our study highlights that XR architecture design must shift from generic resource scaling toward phase-aware scheduling and elastic resource allocation in order to achieve greater energy efficiency and high performance in future XR systems.
翻译:增强现实与虚拟现实(统称为扩展现实,XR)的边缘与移动平台必须在严格的功耗和面积约束下提供确定性的超低延迟性能。然而,XR工作负载的多样性正在迅速增加,其特征表现为异构的算子类型和复杂的数据流结构。这一趋势对以卷积神经网络(CNN)为中心的传统加速器架构构成了重大挑战,导致传统以计算为中心的优化策略收益递减。尽管该问题至关重要,目前仍缺乏对完整XR流水线的系统性架构理解。本文提出了一种采用跨层方法的XR工作负载架构分类,该方法将基于模型的高层设计空间探索(DSE)与商用GPU和CPU硬件的实证剖析相结合。通过分析涵盖12个不同XR内核的代表性工作负载集,我们将其复杂的架构特征提炼为少量跨层工作负载原型(例如容量受限型和开销敏感型)。基于这些原型,我们进一步提取了关键的架构洞见,并为下一代XR片上系统提供了可操作的设计指南。我们的研究强调,XR架构设计必须从通用的资源扩展转向阶段感知调度和弹性资源分配,以实现未来XR系统更高的能效和性能。