Automated Valet Parking (AVP) requires precise localization in challenging garage conditions, including poor lighting, sparse textures, repetitive structures, dynamic scenes, and the absence of Global Positioning System (GPS) signals, which often pose problems for conventional localization methods. To address these adversities, we present AVM-SLAM, a semantic visual SLAM framework with multi-sensor fusion in a Bird's Eye View (BEV). Our framework integrates four fisheye cameras, four wheel encoders, and an Inertial Measurement Unit (IMU). The fisheye cameras form an Around View Monitor (AVM) subsystem, generating BEV images. Convolutional Neural Networks (CNNs) extract semantic features from these images, aiding in mapping and localization tasks. These semantic features provide long-term stability and perspective invariance, effectively mitigating environmental challenges. Additionally, data fusion from wheel encoders and IMU enhances system robustness by improving motion estimation and reducing drift. To validate AVM-SLAM's efficacy and robustness, we provide a large-scale, high-resolution underground garage dataset, available at https://github.com/yale-cv/avm-slam. This dataset enables researchers to further explore and assess AVM-SLAM in similar environments.
翻译:自动代客泊车(AVP)需要在具有挑战性的车库环境中实现精确定位,包括光照不良、纹理稀疏、结构重复、动态场景以及全球定位系统(GPS)信号缺失等情况——这些因素通常会对传统定位方法造成困扰。为应对上述不利条件,我们提出AVM-SLAM,一种基于鸟瞰视图(BEV)的多传感器融合语义视觉SLAM框架。该框架集成了四个鱼眼相机、四个轮式编码器以及一个惯性测量单元(IMU)。鱼眼相机构成环视监控(AVM)子系统,生成BEV图像。卷积神经网络(CNN)从这些图像中提取语义特征,用于辅助建图与定位任务。这些语义特征具备长期稳定性和视角不变性,可有效缓解环境挑战。此外,轮式编码器与IMU的数据融合通过改善运动估计并减少漂移,增强了系统鲁棒性。为验证AVM-SLAM的有效性与鲁棒性,我们提供了大规模高分辨率地下车库数据集(获取地址:https://github.com/yale-cv/avm-slam),该数据集能够支持研究者在类似环境中对AVM-SLAM进行深入探索与评估。