Polymorph：面向嵌入式设备视频流的高效能多标签分类 (Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices) - 专知论文

会员服务 ·

0

视频 · 嵌入 · 嵌入式 · 适配 · 适配器 ·

Polymorph: Energy-Efficient Multi-Label Classification for Video Streams on Embedded Devices

翻译：Polymorph：面向嵌入式设备视频流的高效能多标签分类

Saeid Ghafouri,Mohsen Fayyaz,Xiangchen Li,Deepu John,Bo Ji,Dimitrios Nikolopoulos,Hans Vandierendonck

from arxiv, Accepted at the IEEE/CVF winter conference on applications of computer vision (WACV 2026)

Real-time multi-label video classification on embedded devices is constrained by limited compute and energy budgets. Yet, video streams exhibit structural properties such as label sparsity, temporal continuity, and label co-occurrence that can be leveraged for more efficient inference. We introduce Polymorph, a context-aware framework that activates a minimal set of lightweight Low Rank Adapters (LoRA) per frame. Each adapter specializes in a subset of classes derived from co-occurrence patterns and is implemented as a LoRA weight over a shared backbone. At runtime, Polymorph dynamically selects and composes only the adapters needed to cover the active labels, avoiding full-model switching and weight merging. This modular strategy improves scalability while reducing latency and energy overhead. Polymorph achieves 40% lower energy consumption and improves mAP by 9 points over strong baselines on the TAO dataset. Polymorph is open source at https://github.com/inference-serving/polymorph/.

翻译：在嵌入式设备上进行实时多标签视频分类受到有限计算资源和能量预算的制约。然而，视频流具有标签稀疏性、时间连续性及标签共现性等结构特性，这些特性可用于实现更高效的推理。本文提出Polymorph，一种上下文感知框架，该框架为每帧视频激活一组最小化的轻量级低秩适配器（LoRA）。每个适配器专精于基于共现模式推导出的类别子集，并以共享主干网络上的LoRA权重形式实现。在运行时，Polymorph动态选择并组合仅覆盖当前活跃标签所需的适配器，避免了全模型切换与权重融合。这种模块化策略在提升可扩展性的同时降低了延迟与能耗开销。在TAO数据集上，Polymorph相比强基线方法实现了能耗降低40%，平均精度均值（mAP）提升9个百分点的效果。Polymorph已在https://github.com/inference-serving/polymorph/开源。

0

相关内容

视频

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

专知会员服务

13+阅读 · 2022年3月24日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

自由视点三维视频中纹理-深度图像联合建模及应用

国家自然科学基金

0+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models

Arxiv

0+阅读 · 1月12日

QCaption: Video Captioning and Q&A through Fusion of Large Multimodal Models

Arxiv

0+阅读 · 1月10日

CrackSegFlow: Controllable Flow Matching Synthesis for Generalizable Crack Segmentation with a 50K Image-Mask Benchmark

Arxiv

0+阅读 · 1月8日

Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding

Arxiv

0+阅读 · 1月7日

UniVideo: Unified Understanding, Generation, and Editing for Videos

Arxiv

0+阅读 · 1月7日

VIP会员

文章信息

相关主题

相关VIP内容

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

【ACL2022】一种基于三阶张量同构的高效实体对齐译码算法, An Effective and Efficient Entity Alignment Decoding Algorithm via Third-Order Tensor Isomorphism

专知会员服务

13+阅读 · 2022年3月24日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

热门VIP内容

开通专知VIP会员享更多权益服务

具身智能中的语义生命周期：基于基础模型的获取、表征与存储

《TERRADEFENDER：一个用于战略战场情报准备的统一平台》

【NTU博士论文】视频生成新突破：从人脸说话视频到通用视频制作

麻省理工学院启动新项目为人工智能时代培训军事领导者

相关资讯

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

论文报告 | Graph-based Neural Multi-Document Summarization

论文报告 | Graph-based Neural Multi-Document Summarization

科技创新与创业

15+阅读 · 2017年12月15日

相关论文

ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models

Arxiv

0+阅读 · 1月12日

QCaption: Video Captioning and Q&A through Fusion of Large Multimodal Models

Arxiv

0+阅读 · 1月10日

CrackSegFlow: Controllable Flow Matching Synthesis for Generalizable Crack Segmentation with a 50K Image-Mask Benchmark

Arxiv

0+阅读 · 1月8日

Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding

Arxiv

0+阅读 · 1月7日

UniVideo: Unified Understanding, Generation, and Editing for Videos

Arxiv

0+阅读 · 1月7日

相关基金

基于DASH的交互式三维视频系统建模

国家自然科学基金

1+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

自由视点三维视频中纹理-深度图像联合建模及应用

国家自然科学基金

0+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

基于组合Hodge理论的图像视频质量评价方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员