Deployment of dynamic neural networks on edge accelerators requires careful consideration of hardware constraints beyond conventional complexity metrics such as Multiply-Accumulate operations. In Early-Exiting Neural Networks (EENN), exit placement, quantization level, and hardware workload mapping interact in non-trivial ways, influencing memory traffic, accelerator utilization, and ultimately energy-latency trade-offs. These interactions remain insufficiently understood in existing Neural Architecture Search (NAS) approaches, which typically rely on proxy metrics or hardware-in-the-loop evaluation. This work presents a hardware-algorithm co-design framework for EENN that explicitly models the interplay between quantization, exit configuration, and multi-core accelerator mapping. Using analytical design space exploration, we characterize how small architectural variations can induce disproportionate changes in hardware efficiency due to tensor dimension alignment and dataflow effects. Building on this analysis, we formulate EENN deployment as a constrained multi-objective optimization problem balancing accuracy, energy-latency product, exit overhead, and dynamic inference behavior. Experimental results on CIFAR-10 demonstrate that the proposed framework identifies architectures achieving over 50\% reduction in energy-latency product compared to static baselines under 8-bit quantization. The results highlight the importance of deployment-aware co-design for dynamic inference on heterogeneous edge platforms.
翻译:动态神经网络在边缘加速器上的部署需要谨慎考虑硬件约束,这些约束超越了传统复杂度指标(如乘累加运算)。在早期退出神经网络中,退出点位置、量化级别与硬件工作负载映射之间存在复杂的相互作用,影响内存流量、加速器利用率,并最终决定能耗-延迟权衡。现有神经架构搜索方法通常依赖替代指标或硬件在环评估,对这些相互作用的理解尚不充分。本文提出了一种面向早期退出神经网络的硬件-算法协同设计框架,该框架显式建模量化、退出配置与多核加速器映射之间的交互关系。通过分析设计空间探索,我们揭示了因张量维度对齐与数据流效应,微小的架构变化如何引发硬件效率的剧烈变化。基于此分析,我们将早期退出神经网络部署形式化为一个约束多目标优化问题,平衡精度、能耗-延迟乘积、退出开销与动态推理行为。在CIFAR-10上的实验结果表明,在8比特量化下,与静态基线相比,所提框架识别的架构实现了超过50%的能耗-延迟乘积降低。这些结果凸显了面向异构边缘平台动态推理的部署感知协同设计的重要性。