CCTV-based vehicle tracking systems face structural limitations in continuously connecting the trajectories of the same vehicle across multiple camera environments. In particular, blind spots occur due to the intervals between CCTVs and limited Fields of View (FOV), which leads to object ID switching and trajectory loss, thereby reducing the reliability of real-time path prediction. This paper proposes SPOT (Spatial Prediction Over Trajectories), a map-guided LLM agent capable of tracking vehicles even in blind spots of multi-CCTV environments without prior training. The proposed method represents road structures (Waypoints) and CCTV placement information as documents based on 2D spatial coordinates and organizes them through chunking techniques to enable real-time querying and inference. Furthermore, it transforms the vehicle's position into the actual world coordinate system using the relative position and FOV information of objects observed in CCTV images. By combining map spatial information with the vehicle's moving direction, speed, and driving patterns, a beam search is performed at the intersection level to derive candidate CCTV locations where the vehicle is most likely to enter after the blind spot. Experimental results based on the CARLA simulator in a virtual city environment confirmed that the proposed method accurately predicts the next appearing CCTV even in blind spot sections, maintaining continuous vehicle trajectories more effectively than existing techniques.
翻译:基于闭路电视(CCTV)的车辆跟踪系统在跨多个摄像头环境中持续连接同一车辆的轨迹方面面临结构性限制。特别是,由于CCTV之间的间隔和有限的视野(FOV)会导致盲区的出现,这进而引发目标ID切换和轨迹丢失,从而降低了实时路径预测的可靠性。本文提出了SPOT(基于轨迹的空间预测),一种基于地图引导的大型语言模型(LLM)智能体,能够在无需先验训练的情况下,于多CCTV环境的盲区中跟踪车辆。所提出的方法将道路结构(路径点)和CCTV布设信息表示为基于二维空间坐标的文档,并通过分块技术进行组织,以实现实时查询与推理。此外,它利用CCTV图像中观测到的目标的相对位置和FOV信息,将车辆位置转换到真实世界坐标系中。通过将地图空间信息与车辆的运动方向、速度和行驶模式相结合,在交叉口级别执行束搜索,以推导出车辆在驶离盲区后最可能进入的候选CCTV位置。在虚拟城市环境中基于CARLA模拟器的实验结果表明,即使在盲区路段,所提出的方法也能准确预测车辆下一个将出现的CCTV,相比现有技术能更有效地维持连续的车辆轨迹。