The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces AxiomVision, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, AxiomVision enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, AxiomVision provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, AxiomVision achieves a 25.7\% improvement in accuracy.
翻译:多媒体与计算机视觉技术的快速发展要求采用自适应视觉模型部署策略,以有效处理多样化的任务和变化的环境。本文提出AxiomVision,一种新颖的框架,通过利用边缘计算在不同场景下动态选择最高效的视频分析视觉模型,从而保障分析精度。AxiomVision采用分层边缘-云架构,支持部署从轻量级到复杂DNN的广泛视觉模型,这些模型可根据具体场景定制,同时考虑摄像头源的影响。此外,AxiomVision提供三项核心创新:(1) 利用持续在线学习的动态视觉模型选择机制,(2) 高效考虑摄像机视角影响的在线方法,以及(3) 加速模型选择过程的拓扑驱动分组方法。凭借严格的理论保证,这些进展为多媒体系统固有的视觉任务(如目标检测、分类与计数)提供了可扩展且高效的解决方案。实验表明,AxiomVision实现了25.7%的精度提升。