The advent of tiny artificial intelligence (AI) accelerators enables AI to run at the extreme edge, offering reduced latency, lower power cost, and improved privacy. When integrated into wearable devices, these accelerators open exciting opportunities, allowing various AI apps to run directly on the body. We present Synergy that provides AI apps with best-effort performance via system-driven holistic collaboration over AI accelerator-equipped wearables. To achieve this, Synergy provides device-agnostic programming interfaces to AI apps, giving the system visibility and controllability over the app's resource use. Then, Synergy maximizes the inference throughput of concurrent AI models by creating various execution plans for each app considering AI accelerator availability and intelligently selecting the best set of execution plans. Synergy further improves throughput by leveraging parallelization opportunities over multiple computation units. Our evaluations with 7 baselines and 8 models demonstrate that, on average, Synergy achieves a 23.0 times improvement in throughput, while reducing latency by 73.9% and power consumption by 15.8%, compared to the baselines.
翻译:微型人工智能(AI)加速器的出现使得AI能够在极端边缘运行,提供更低的延迟、更低的功耗成本以及更强的隐私保护。当这些加速器集成到可穿戴设备中时,它们开启了令人兴奋的可能性,允许各种AI应用直接在人体上运行。我们提出了Synergy,该系统通过在配备AI加速器的可穿戴设备上实现系统驱动的整体协作,为AI应用提供尽力而为的性能。为实现这一目标,Synergy为AI应用提供设备无关的编程接口,使系统能够了解并控制应用的资源使用情况。随后,Synergy通过为每个应用考虑AI加速器的可用性,创建多种执行计划,并智能地选择最佳执行计划集合,以最大化并发AI模型的推理吞吐量。Synergy还通过利用多个计算单元上的并行化机会,进一步提升了吞吐量。我们使用7个基线方法和8个模型进行的评估表明,与基线相比,Synergy平均实现了23.0倍的吞吐量提升,同时延迟降低了73.9%,功耗降低了15.8%。