AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures

Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction remains entirely unexplored. This paper presents AxMoE, the first study of the impact of approximate multiplication on MoE DNN architectures. We evaluate three MoE variants: Hard MoE, Soft MoE, and Cluster MoE against dense baselines across three CNN architectures (ResNet-20, VGG11_bn, VGG19_bn) on CIFAR-100 and a Vision Transformer (ViT-Small) on Tiny ImageNet-200 dataset, using eight 8-bit signed multipliers (including one exact baseline) from the EvoApproxLib library. Results show that, without retraining, the Dense baseline is the most resilient topology across all CNN architectures, whereas on ViT-Small, all topologies degrade at comparable rates regardless of routing strategy. After approximate-aware retraining, recovery varies substantially across architectures, topologies, and multipliers. ResNet-20 achieves full recovery across the entire multiplier range, whereas VGG architectures recover at moderate multipliers but fail irreversibly at aggressive ones for all topologies except Cluster MoE on VGG11_bn; on ViT-Small, Hard MoE outperforms Dense under aggressive approximation at equal normalized inference cost. These results pave the way for future approximate MoE hardware-software co-design strategies.

翻译：边缘深度神经网络推理需同时提升准确性、计算效率与能耗。近似计算与混合专家（MoE）架构作为实现高效推理的独立技术路径，前者通过低功耗近似乘法器替代精确算术，后者则通过将输入路由至专业化专家子网络实现条件计算。然而，二者间的相互作用尚未得到探索。本文提出AxMoE，首次研究近似乘法对MoE深度神经网络架构的影响。我们基于CIFAR-100数据集评估三种CNN架构（ResNet-20、VGG11_bn、VGG19_bn）上的三种MoE变体（硬MoE、软MoE与聚类MoE）相较于密集基线的表现，并利用EvoApproxLib库中八种8位有符号乘法器（含一种精确基线）在Tiny ImageNet-200数据集上评测Vision Transformer（ViT-Small）。结果表明：未经重训练时，密集基线在所有CNN架构中具最强鲁棒性，而在ViT-Small上，所有拓扑结构随路由策略不同均呈现近似退化速率。经过近似感知重训练后，不同架构、拓扑结构及乘法器的恢复程度差异显著。ResNet-20在全乘法器范围内实现完全恢复；VGG架构在中等近似度乘法器下可恢复，但除VGG11_bn的聚类MoE拓扑外，在激进乘法器下发生不可逆失效；在ViT-Small上，当采用激进近似且归一化推理成本相当时，硬MoE性能优于密集基线。这些结果为未来的近似MoE软硬件协同设计策略奠定了基础。