A typical optimization of customized accelerators for error-tolerant applications such as multimedia, recognition, and classification is to replace traditional arithmetic units like multipliers and adders with the approximate ones to enhance energy efficiency while adhering to accuracy requirements. However, the plethora of arithmetic units and diverse approximate unit options result in an exceedingly large design space. Therefore, there is a pressing need for an end-to-end design framework capable of navigating this intricate design space for approximation optimization. Traditional methods relying on simulation-based or blackbox model evaluations suffer from either high computational costs or limitations in accuracy and scalability, posing significant challenges to the optimization process. In this paper, we propose a Graph Neural Network (GNN) model that leverages the physical connections of arithmetic units to capture their influence on the performance, power, area (PPA), and accuracy of the accelerator. Particularly, we notice that critical path plays a key role in node feature of the GNN model and having it embedded in the feature vector greatly enhances the prediction quality of the models. On top of the models that allow rapid and efficient PPA and accuracy prediction of various approximate accelerator configurations, we can further explore the large design space effectively and build an end-to-end accelerator approximation framework named ApproxPilot to optimize the accelerator approximation. Our experimental results demonstrate that ApproxPilot outperforms state-of-the-art approximation optimization frameworks in both performance and hardware overhead with the same accuracy constraints.
翻译:针对多媒体、识别与分类等容错应用,定制化加速器的典型优化方法是将传统算术单元(如乘法器、加法器)替换为近似计算单元,从而在满足精度要求的前提下提升能效。然而,海量的算术单元种类与多样化的近似单元选项构成了极其庞大的设计空间。因此,亟需一种能够在此复杂设计空间中导航以实现近似优化的端到端设计框架。传统基于仿真的方法或黑盒模型评估方法存在计算成本高昂、精度与可扩展性受限等问题,为优化过程带来显著挑战。本文提出一种图神经网络(GNN)模型,该模型通过利用算术单元间的物理连接关系,捕捉其对加速器性能、功耗、面积(PPA)及精度的影响。特别地,我们发现关键路径在图神经网络节点特征中起关键作用,将其嵌入特征向量可显著提升模型预测质量。基于该可实现快速高效PPA与精度预测的模型,我们能够进一步有效探索庞大设计空间,并构建名为ApproxPilot的端到端加速器近似计算框架以优化加速器近似设计。实验结果表明,在相同精度约束下,ApproxPilot在性能与硬件开销方面均优于现有先进近似优化框架。