A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices.
翻译:多智能体间一致的时空协同是协同感知的基础,其旨在通过智能体间的信息交换提升感知能力。为实现这种时空对齐,传统方法依赖外部设备提供定位与时钟信号。然而,硬件生成的信号易受噪声干扰并可能遭受恶意攻击,从而危及时空对齐的精度。与依赖外部硬件不同,本研究提出一种新颖方法:通过识别不同智能体感知数据中固有的几何模式进行对齐。基于这一思想,我们提出了一种无需外部定位与时钟设备即可独立运行的鲁棒协同感知系统。该系统的核心模块——FreeAlign,基于每个智能体检测到的边界框为其构建显著物体图,并利用图神经网络识别智能体间的公共子图,从而实现精确的相对位姿与时间估计。我们在真实世界与仿真数据集上验证了FreeAlign。结果表明,由FreeAlign赋能的鲁棒协同感知系统,其性能可与依赖精确定位与时钟设备的系统相媲美。