Multi-camera tracking with overlapping fields of view typically relies on centralized fusion, which creates computational bottlenecks that prevent deployment at scale. We present MV3DT, a fully distributed framework for real-time multi-view 3D tracking that achieves accurate identity propagation and occlusion recovery through peer-to-peer coordination, eliminating the need for central aggregation. Each camera node executes a lightweight modular pipeline comprising monocular 3D perception, distributed multi-view association, and collaborative fusion via lightweight messaging. MV3DT achieves 96.5% IDF1, 93.1% MOTA, and 94.6% MOTP on WILDTRACK, competitive with state-of-the-art centralized methods, and unprecedented 41.7% IDF1 and 50.9% MOTA on SCOUT while demonstrating superior scalability: sustaining 30 FPS on 100 cameras with <10ms inter-camera latency and only 2.2% communication overhead. MV3DT operates in a zero-shot regime given camera calibrations, requiring no scene-specific learning and making it directly deployable in new environments. These results establish MV3DT as a practical solution for real-time multi-view tracking in large-scale overlapping camera networks.
翻译:具有重叠视野的多摄像机跟踪通常依赖集中式融合,这会产生计算瓶颈,阻碍系统大规模部署。我们提出MV3DT——一种用于实时多视角三维跟踪的全分布式框架,通过点对点协调实现精确的身份传播与遮挡恢复,无需中心化聚合。每个摄像机节点执行轻量级模块化流水线,包括单目三维感知、分布式多视角关联,以及基于轻量消息传递的协同融合。MV3DT在WILDTRACK数据集上达到96.5%的IDF1、93.1%的MOTA和94.6%的MOTP,与最先进的集中式方法性能相当;在SCOUT数据集上则取得前所未有的41.7% IDI和50.9% MOTA,同时展现出卓越的可扩展性:在100台摄像机上以小于10毫秒的帧间延迟和仅2.2%的通信开销维持30 FPS处理速度。给定摄像机标定参数后,MV3DT以零样本模式运行,无需场景特定训练,即可直接部署于新环境。这些结果确立了MV3DT作为大规模重叠摄像机网络中实时多视角跟踪的实用解决方案。