Understanding Policy and Technical Aspects of AI-Enabled Smart Video Surveillance to Address Public Safety

Recent advancements in artificial intelligence (AI) have seen the emergence of smart video surveillance (SVS) in many practical applications, particularly for building safer and more secure communities in our urban environments. Cognitive tasks, such as identifying objects, recognizing actions, and detecting anomalous behaviors, can produce data capable of providing valuable insights to the community through statistical and analytical tools. However, artificially intelligent surveillance systems design requires special considerations for ethical challenges and concerns. The use and storage of personally identifiable information (PII) commonly pose an increased risk to personal privacy. To address these issues, this paper identifies the privacy concerns and requirements needed to address when designing AI-enabled smart video surveillance. Further, we propose the first end-to-end AI-enabled privacy-preserving smart video surveillance system that holistically combines computer vision analytics, statistical data analytics, cloud-native services, and end-user applications. Finally, we propose quantitative and qualitative metrics to evaluate intelligent video surveillance systems. The system shows the 17.8 frame-per-second (FPS) processing in extreme video scenes. However, considering privacy in designing such a system results in preferring the pose-based algorithm to the pixel-based one. This choice resulted in dropping accuracy in both action and anomaly detection tasks. The results drop from 97.48 to 73.72 in anomaly detection and 96 to 83.07 in the action detection task. On average, the latency of the end-to-end system is 36.1 seconds.

翻译：人工智能（AI）的最新进展推动智能视频监控（SVS）在诸多实际应用中的涌现，尤其在构建更安全、更可靠的城市场景方面。通过识别目标、行为分析及异常检测等认知任务，可借助统计与分析工具生成为社区提供深刻洞察的数据。然而，人工智能监控系统的设计必须审慎应对伦理挑战与关切。个人可识别信息（PII）的采集与存储通常加剧个人隐私风险。针对上述问题，本文明确了设计AI赋能智能视频监控时需解决的隐私关切与需求。进一步地，我们提出首个端到端的AI赋能隐私保护智能视频监控系统，该系统有机融合了计算机视觉分析、统计数据分析、云原生服务及终端用户应用。最后，我们提出用于评估智能视频监控系统的定量与定性指标。在极端视频场景下，该系统可实现17.8帧/秒（FPS）的处理速度。但设计此类系统时对隐私的考量，使得基于姿态的算法优于基于像素的算法。这一选择导致动作检测与异常检测任务的准确率下降：异常检测从97.48降至73.72，动作检测从96降至83.07。端到端系统的平均延迟为36.1秒。