In this paper, we introduce MVSparse, a novel and efficient framework for cooperative multi-person tracking across multiple synchronized cameras. The MVSparse system is comprised of a carefully orchestrated pipeline, combining edge server-based models with distributed lightweight Reinforcement Learning (RL) agents operating on individual cameras. These RL agents intelligently select informative blocks within each frame based on historical camera data and detection outcomes from neighboring cameras, significantly reducing computational load and communication overhead. The edge server aggregates multiple camera views to perform detection tasks and provides feedback to the individual agents. By projecting inputs from various perspectives onto a common ground plane and applying deep detection models, MVSparse optimally leverages temporal and spatial redundancy in multi-view videos. Notably, our contributions include an empirical analysis of multi-camera pedestrian tracking datasets, the development of a multi-camera, multi-person detection pipeline, and the implementation of MVSparse, yielding impressive results on both open datasets and real-world scenarios. Experimentally, MVSparse accelerates overall inference time by 1.88X and 1.60X compared to a baseline approach while only marginally compromising tracking accuracy by 2.27% and 3.17%, respectively, showcasing its promising potential for efficient multi-camera tracking applications.
翻译:本文提出MVSparse——一种面向多同步摄像头协同多行人跟踪的新型高效框架。MVSparse系统包含精心编排的流水线,将边缘服务器端模型与各摄像头分布式轻量级强化学习(RL)智能体相结合。这些RL智能体基于历史摄像头数据及相邻摄像头的检测结果,智能选取每帧中的信息块,显著降低计算负载与通信开销。边缘服务器聚合多摄像头视角执行检测任务,并向各智能体提供反馈。通过将不同视角的输入投影至公共地平面并应用深度检测模型,MVSparse能够最优利用多视角视频中的时空冗余性。值得注意的是,我们的贡献包括:多摄像头行人跟踪数据集的经验分析、多摄像头多行人检测流水线的开发,以及MVSparse的实现,其在公开数据集与真实场景中均取得了显著成果。实验表明,与基准方法相比,MVSparse将整体推理速度分别提升1.88倍和1.60倍,而跟踪精度仅分别降低2.27%和3.17%,展现了其在高效多摄像头跟踪应用中的巨大潜力。