The network edge's role in Artificial Intelligence (AI) inference processing is rapidly expanding, driven by a plethora of applications seeking computational advantages. These applications strive for data-driven efficiency, leveraging robust AI capabilities and prioritizing real-time responsiveness. However, as demand grows, so does system complexity. The proliferation of AI inference accelerators showcases innovation but also underscores challenges, particularly the varied software and hardware configurations of these devices. This diversity, while advantageous for certain tasks, introduces hurdles in device integration and coordination. In this paper, our objectives are three-fold. Firstly, we outline the requirements and components of a framework that accommodates hardware diversity. Next, we assess the impact of device heterogeneity on AI inference performance, identifying strategies to optimize outcomes without compromising service quality. Lastly, we shed light on the prevailing challenges and opportunities in this domain, offering insights for both the research community and industry stakeholders.
翻译:人工智能(AI)推理处理在网络边缘的作用正迅速扩展,这得益于大量追求计算优势的应用需求。这些应用追求数据驱动的高效性,充分利用强大的AI能力并优先满足实时响应要求。然而,随着需求增长,系统复杂性也随之提升。AI推理加速器的激发展现了创新性,但也凸显了挑战,尤其是这些设备在软件和硬件配置上的多样性。这种异质性虽然在某些任务中具有优势,却给设备集成与协同带来了障碍。本文目标有三:首先,我们概述了适应硬件多样性的框架需求及组成部分;其次,评估设备异质性对AI推理性能的影响,并识别在不牺牲服务质量的前提下优化结果的策略;最后,阐明该领域当前面临的挑战与机遇,为研究界和行业利益相关者提供见解。