Reliable traversable area segmentation in unstructured environments is critical for planning and decision-making in autonomous driving. However, existing data-driven approaches often suffer from degraded segmentation performance in out-of-distribution (OOD) scenarios, consequently impairing downstream driving tasks. To address this issue, we propose OT-Drive, an Optimal Transport--driven multi-modal fusion framework. The proposed method formulates RGB and surface normal fusion as a distribution transport problem. Specifically, we design a novel Scene Anchor Generator (SAG) to decompose scene information into the joint distribution of weather, time-of-day, and road type, thereby constructing semantic anchors that can generalize to unseen scenarios. Subsequently, we design an innovative Optimal Transport-based multi-modal fusion module (OT Fusion) to transport RGB and surface normal features onto the manifold defined by the semantic anchors, enabling robust traversable area segmentation under OOD scenarios. Experimental results demonstrate that our method achieves 95.16% mIoU on ORFD OOD scenarios, outperforming prior methods by 6.35%, and 89.79% mIoU on cross-dataset transfer tasks, surpassing baselines by 13.99%.These results indicate that the proposed model can attain strong OOD generalization with only limited training data, substantially enhancing its practicality and efficiency for real-world deployment.
翻译:在非结构化环境中实现可靠的可通行区域分割对于自动驾驶的规划与决策至关重要。然而,现有的数据驱动方法在分布外场景中往往存在分割性能下降的问题,进而损害下游驾驶任务。为解决此问题,我们提出了OT-Drive,一种基于最优传输的多模态融合框架。该方法将RGB图像与表面法向信息的融合表述为一个分布传输问题。具体而言,我们设计了一种新颖的场景锚点生成器,将场景信息分解为天气、时间与道路类型的联合分布,从而构建能够泛化到未见场景的语义锚点。随后,我们设计了一种创新的基于最优传输的多模态融合模块,将RGB特征与表面法向特征传输到由语义锚点定义的流形上,实现在分布外场景下鲁棒的可通行区域分割。实验结果表明,我们的方法在ORFD分布外场景上达到了95.16%的mIoU,优于先前方法6.35%;在跨数据集迁移任务上达到了89.79%的mIoU,超越基线方法13.99%。这些结果表明,所提模型仅需有限训练数据即可获得强大的分布外泛化能力,显著提升了其在实际部署中的实用性与效率。