A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.
翻译:一致的时空协同是多智能体协同感知的基础,旨在通过智能体间的信息交换提升感知能力。为实现这种时空对齐,传统方法依赖外部设备提供定位与时钟信号。然而,硬件生成的信号易受噪声和潜在恶意攻击的影响,可能危及时空对齐的精度。本文不依赖外部硬件,提出一种新方法:通过识别各智能体感知数据中固有的几何模式来实现对齐。遵循这一思路,我们设计了一个不依赖外部定位与时钟设备的鲁棒协同感知系统。该系统核心模块——FreeAlign,基于每个智能体的检测框构建显著目标图,并利用图神经网络识别智能体间的公共子图,从而精确估计相对位姿与时间。我们在真实与模拟数据集上验证了FreeAlign。结果表明,基于FreeAlign的鲁棒协同感知系统性能可与依赖精确定位与时钟设备的系统相媲美。