Real-time multi-label video classification on embedded devices is constrained by limited compute and energy budgets. Yet, video streams exhibit structural properties such as label sparsity, temporal continuity, and label co-occurrence that can be leveraged for more efficient inference. We introduce Polymorph, a context-aware framework that activates a minimal set of lightweight Low Rank Adapters (LoRA) per frame. Each adapter specializes in a subset of classes derived from co-occurrence patterns and is implemented as a LoRA weight over a shared backbone. At runtime, Polymorph dynamically selects and composes only the adapters needed to cover the active labels, avoiding full-model switching and weight merging. This modular strategy improves scalability while reducing latency and energy overhead. Polymorph achieves 40% lower energy consumption and improves mAP by 9 points over strong baselines on the TAO dataset. Polymorph is open source at https://github.com/inference-serving/polymorph/.
翻译:在嵌入式设备上进行实时多标签视频分类受到有限计算资源和能量预算的制约。然而,视频流具有标签稀疏性、时间连续性及标签共现性等结构特性,这些特性可用于实现更高效的推理。本文提出Polymorph,一种上下文感知框架,该框架为每帧视频激活一组最小化的轻量级低秩适配器(LoRA)。每个适配器专精于基于共现模式推导出的类别子集,并以共享主干网络上的LoRA权重形式实现。在运行时,Polymorph动态选择并组合仅覆盖当前活跃标签所需的适配器,避免了全模型切换与权重融合。这种模块化策略在提升可扩展性的同时降低了延迟与能耗开销。在TAO数据集上,Polymorph相比强基线方法实现了能耗降低40%,平均精度均值(mAP)提升9个百分点的效果。Polymorph已在https://github.com/inference-serving/polymorph/开源。