Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

Application-Specific Instruction-Set Processors (ASIPs) built on the RISC-V architecture offer specialization opportunities for various applications. Existing frameworks are largely designed around fixed instruction extension interfaces and rely on manual software adaptation. However, as emerging domains scale up in complexity, two major challenges arise. First, memory access remains a primary bottleneck as existing design flows lack architectural awareness of memory interfaces, leading to suboptimal interface selection and orchestration. Second, the semantic complexity of custom instruction extensions, characterized by non-trivial control logic and irregular memory behaviors, hinders the ability of conventional compilers to perform automated and comprehensive offloading. We present Aquas, a holistic hardware-software co-design framework built upon MLIR. Aquas proposes a memory interface model that jointly considers interface characteristics and cache effects, along with an interface-aware synthesis flow guided by this model that progressively optimizes the input specification and generates efficient hardware implementations. We also propose an e-graph-based retargetable compiler approach with a novel matching engine for efficient instruction mapping and offloading, enabling robust and effective utilization of custom instruction capabilities. Case studies across four diverse domains show that Aquas delivers substantial acceleration, achieving up to 15.61x speedup with 14.5% area overhead and zero frequency degradation, proving highly competitive in domain acceleration against more powerful general-purpose cores and vector extensions.

翻译：基于RISC-V架构构建的应用专用指令集处理器（ASIP）为各类应用提供了领域专用化机遇。现有框架主要围绕固定指令扩展接口设计，并依赖人工软件适配。然而，随着新兴领域的复杂性日益提升，两大挑战随之凸显：其一，现有设计流程缺乏对内存接口的架构感知能力，导致接口选择与编排效率低下，使得内存访问始终是主要瓶颈；其二，定制指令扩展的语义复杂性——表现为非平凡控制逻辑与非规则内存行为——阻碍了传统编译器实现自动化、全面化的卸载功能。本文提出Aquas——基于MLIR构建的全方位软硬件协同设计框架。Aquas提出了一种联合考量接口特性与缓存效应的内存接口模型，并据此设计接口感知的综合流程，该流程可渐进式优化输入规范并生成高效硬件实现。我们还提出了一种基于e-graph的可重定向编译器方法，其新颖的匹配引擎能够实现高效的指令映射与卸载，从而稳健高效地利用定制指令能力。跨四个不同领域的案例研究表明，Aquas实现了显著加速效果——在仅增加14.5%面积开销且频率零降级的情况下，最高达15.61倍加速比，在领域加速效果上优于更强大的通用处理器内核与向量扩展方案。