We propose an efficient cross-cameras surveillance system called,STAC, that leverages spatio-temporal associations between multiple cameras to provide real-time analytics and inference under constrained network environments. STAC is built using the proposed omni-scale feature learning people reidentification (reid) algorithm that allows accurate detection, tracking and re-identification of people across cameras using the spatio-temporal characteristics of video frames. We integrate STAC with frame filtering and state-of-the-art compression for streaming technique (that is, ffmpeg libx264 codec) to remove redundant information from cross-camera frames. This helps in optimizing the cost of video transmission as well as compute/processing, while maintaining high accuracy for real-time query inference. The introduction of AICity Challenge 2023 Data [1] by NVIDIA has allowed exploration of systems utilizing multi-camera people tracking algorithms. We evaluate the performance of STAC using this dataset to measure the accuracy metrics and inference rate for reid. Additionally, we quantify the reduction in video streams achieved through frame filtering and compression using FFmpeg compared to the raw camera streams. For completeness, we make available our repository to reproduce the results, available at https://github.com/VolodymyrVakhniuk/CS444_Final_Project.
翻译:我们提出了一种名为STAC的高效跨摄像头监控系统,该系统利用多个摄像头之间的时空关联性,在受限网络环境下实现实时分析与推理。STAC基于所提出的全尺度特征学习行人重识别算法构建,该算法通过视频帧的时空特性实现跨摄像头的精准行人检测、跟踪与重识别。我们将STAC与帧过滤及最先进的流媒体压缩技术(即ffmpeg libx264编解码器)集成,以消除跨摄像头帧中的冗余信息。这有助于在保持实时查询推理高精度的同时,优化视频传输成本与计算处理开销。NVIDIA发布的AICity Challenge 2023数据集[1]为探索多摄像头行人跟踪算法的系统提供了契机。我们利用该数据集评估STAC的性能,测量重识别的精度指标与推理速率。此外,我们量化了通过FFmpeg进行帧过滤与压缩后,相较原始摄像头流所实现的视频流缩减程度。为完整起见,我们公开了可复现结果的代码库,详见https://github.com/VolodymyrVakhniuk/CS444_Final_Project。