Neural Networks (NN) provide a solid and reliable way of executing different types of applications, ranging from speech recognition to medical diagnosis, speeding up onerous and long workloads. The challenges involved in their implementation at the edge include providing diversity, flexibility, and sustainability. That implies, for instance, supporting evolving applications and algorithms energy-efficiently. Using hardware or software accelerators can deliver fast and efficient computation of the \acp{nn}, while flexibility can be exploited to support long-term adaptivity. Nonetheless, handcrafting an NN for a specific device, despite the possibility of leading to an optimal solution, takes time and experience, and that's why frameworks for hardware accelerators are being developed. This work-in-progress study focuses on exploring the possibility of combining the toolchain proposed by Ratto et al., which has the distinctive ability to favor adaptivity, with approximate computing. The goal will be to allow lightweight adaptable NN inference on FPGAs at the edge. Before that, the work presents a detailed review of established frameworks that adopt a similar streaming architecture for future comparison.
翻译:神经网络(NN)为执行从语音识别到医疗诊断等不同类型的应用提供了坚实可靠的方式,可加速繁重且耗时的工作负载。在边缘部署时面临的挑战包括提供多样性、灵活性和可持续性,这要求例如以高能效方式支持不断发展的应用和算法。使用硬件或软件加速器可以实现神经网络的快速高效计算,同时可利用灵活性支持长期自适应性。然而,为特定设备手工定制神经网络——尽管可能获得最优解决方案——需要时间和经验,这正是硬件加速器框架被开发的原因。本项进展中研究聚焦于探索将Ratto等人提出的具有独特自适应优势的工具链与近似计算相结合的可能性,目标是在边缘FPGA上实现轻量级自适应神经网络推理。在此之前,本文对采用类似流式架构的成熟框架进行详细综述,以便后续比较。