The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, AxiomVision enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, AxiomVision provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, AxiomVision achieves a 25.7\% improvement in accuracy.
翻译:多媒体与计算机视觉技术的快速发展要求采用自适应视觉模型部署策略,以有效处理多样化任务和变化环境。本文提出AxiomVision,一种通过利用边缘计算在不同场景下动态选择最高效视觉模型以保证精度的新型视频分析框架。基于分层边缘-云架构,AxiomVision支持部署从轻量级到复杂DNN的广泛视觉模型,这些模型可根据特定场景定制并兼顾摄像机源影响。该框架提供三项核心创新:(1) 采用持续在线学习的动态视觉模型选择机制,(2) 高效考量摄像机视角影响的在线优化方法,(3) 加速模型选择过程的拓扑驱动分组策略。通过严格的理论保证,这些进展为多媒体系统固有的视觉任务(如目标检测、分类与计数)提供了可扩展的有效解决方案。实验表明,AxiomVision实现了25.7%的精度提升。