Cooperatively utilizing both ego-vehicle and infrastructure sensor data can significantly enhance autonomous driving perception abilities. However, the uncertain temporal asynchrony and limited communication conditions can lead to fusion misalignment and constrain the exploitation of infrastructure data. To address these issues in vehicle-infrastructure cooperative 3D (VIC3D) object detection, we propose the Feature Flow Net (FFNet), a novel cooperative detection framework. FFNet is a flow-based feature fusion framework that uses a feature flow prediction module to predict future features and compensate for asynchrony. Instead of transmitting feature maps extracted from still-images, FFNet transmits feature flow, leveraging the temporal coherence of sequential infrastructure frames. Furthermore, we introduce a self-supervised training approach that enables FFNet to generate feature flow with feature prediction ability from raw infrastructure sequences. Experimental results demonstrate that our proposed method outperforms existing cooperative detection methods while only requiring about 1/100 of the transmission cost of raw data and covers all latency in one model on the DAIR-V2X dataset. The code is available at \href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}.
翻译:协同利用自车与路侧传感器数据可显著增强自动驾驶感知能力。然而,不确定的时间异步性和有限的通信条件可能导致融合失准,并制约路侧数据效用的发挥。针对车路协同三维(VIC3D)目标检测中的上述问题,我们提出特征流网络(FFNet),一种新型协同检测框架。FFNet是一种基于流的特征融合框架,通过特征流预测模块预估未来特征并补偿异步性。与传输从静态图像中提取的特征图不同,FFNet传输特征流,利用路侧连续帧的时间连贯性。此外,我们引入自监督训练方法,使FFNet能从原始路侧序列中生成具有特征预测能力的特征流。实验结果表明,所提方法在DAIR-V2X数据集上仅需原始数据约1/100的传输成本,且能以单一模型覆盖所有延迟,性能优于现有协同检测方法。代码开源地址:\href{https://github.com/haibao-yu/FFNet-VIC3D}{https://github.com/haibao-yu/FFNet-VIC3D}。