DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception

Vehicle-to-Everything (V2X) collaborative perception is crucial for autonomous driving. However, achieving high-precision V2X perception requires a significant amount of annotated real-world data, which can always be expensive and hard to acquire. Simulated data have raised much attention since they can be massively produced at an extremely low cost. Nevertheless, the significant domain gap between simulated and real-world data, including differences in sensor type, reflectance patterns, and road surroundings, often leads to poor performance of models trained on simulated data when evaluated on real-world data. In addition, there remains a domain gap between real-world collaborative agents, e.g. different types of sensors may be installed on autonomous vehicles and roadside infrastructures with different extrinsics, further increasing the difficulty of sim2real generalization. To take full advantage of simulated data, we present a new unsupervised sim2real domain adaptation method for V2X collaborative detection named Decoupled Unsupervised Sim2Real Adaptation (DUSA). Our new method decouples the V2X collaborative sim2real domain adaptation problem into two sub-problems: sim2real adaptation and inter-agent adaptation. For sim2real adaptation, we design a Location-adaptive Sim2Real Adapter (LSA) module to adaptively aggregate features from critical locations of the feature map and align the features between simulated data and real-world data via a sim/real discriminator on the aggregated global feature. For inter-agent adaptation, we further devise a Confidence-aware Inter-agent Adapter (CIA) module to align the fine-grained features from heterogeneous agents under the guidance of agent-wise confidence maps. Experiments demonstrate the effectiveness of the proposed DUSA approach on unsupervised sim2real adaptation from the simulated V2XSet dataset to the real-world DAIR-V2X-C dataset.

翻译：车联网（V2X）协同感知对于自动驾驶至关重要。然而，实现高精度V2X感知需要大量标注的真实世界数据，这些数据通常成本高昂且难以获取。模拟数据因其能够以极低成本大规模生成而受到广泛关注。然而，模拟数据与真实世界数据之间存在的显著域差异——包括传感器类型、反射模式及道路环境的差异——常导致基于模拟数据训练的模型在真实世界数据上表现不佳。此外，真实场景中协同智能体之间也存在域差异，例如自动驾驶车辆与路边基础设施可能安装不同类型的外参不同的传感器，这进一步加剧了模拟到真实泛化的难度。为充分利用模拟数据，我们提出了一种面向V2X协同检测的无监督模拟到真实域适应新方法——解耦无监督模拟到真实适应（DUSA）。该方法将V2X协同模拟到真实域适应问题解耦为两个子问题：模拟到真实域适应与智能体间适应。针对模拟到真实域适应，我们设计了位置自适应模拟到真实适配器（LSA）模块，该模块自适应聚合特征图中关键位置的特征，并通过基于模拟/真实鉴别器的全局聚合特征对齐模拟数据与真实数据的特征。针对智能体间适应，我们进一步提出了置信度感知的智能体间适配器（CIA）模块，在智能体级别置信度图的引导下对齐来自异构智能体的细粒度特征。实验证明了所提出的DUSA方法在从模拟V2XSet数据集到真实DAIR-V2X-C数据集的无监督模拟到真实适应任务中的有效性。