For deep learning inference on edge devices, hardware configurations achieving the same throughput can differ by 2$\times$ in power consumption, yet operators often struggle to find the efficient ones without exhaustive profiling. Existing approaches often rely on inefficient static presets or require expensive offline profiling that must be repeated for each new model or device. To address this problem, we present CORAL, an online optimization method that discovers near-optimal configurations without offline profiling. CORAL leverages distance covariance to statistically capture the non-linear dependencies between hardware settings, e.g., DVFS and concurrency levels, and performance metrics. Unlike prior work, we explicitly formulate the challenge as a throughput-power co-optimization problem to satisfy power budgets and throughput targets simultaneously. We evaluate CORAL on two NVIDIA Jetson devices across three object detection models ranging from lightweight to heavyweight. In single-target scenarios, CORAL achieves 96% $\unicode{x2013}$ 100% of the optimal performance found by exhaustive search. In strict dual-constraint scenarios where baselines fail or exceed power budgets, CORAL consistently finds proper configurations online with minimal exploration.
翻译:在边缘设备上进行深度学习推理时,硬件配置在达到相同吞吐量的情况下,功耗差异可达2倍,然而操作者往往难以在不进行详尽性能剖析的情况下找到高效配置。现有方法通常依赖于低效的静态预设,或需要昂贵的离线剖析,且每次针对新模型或新设备都必须重复此过程。为解决此问题,我们提出了CORAL,一种无需离线剖析即可发现接近最优配置的在线优化方法。CORAL利用距离协方差,以统计方式捕捉硬件设置(例如DVFS和并发级别)与性能指标之间的非线性依赖关系。与先前工作不同,我们明确地将该挑战表述为吞吐量-功耗协同优化问题,以同时满足功耗预算和吞吐量目标。我们在两款NVIDIA Jetson设备上,针对从轻量级到重量级的三种目标检测模型评估了CORAL。在单目标场景中,CORAL达到了穷举搜索所得最优性能的96%至100%。在基线方法失败或超出功耗预算的严格双约束场景中,CORAL始终能以最少的探索在线找到合适的配置。