Texture-Shape Bias Balancing for Robust Synthetic-to-Real Semantic Segmentation in Automotive NIR Imagery

Semantic segmentation is a fundamental component of visual perception in modern automotive systems, enabling pixel-level scene understanding. Near-Infrared imaging (NIR) offers stable detection under difficult illumination conditions, but the development of domain-specific semantic segmentation models remains challenging due to the lack of high-quality annotated data from real-world scenarios. Synthetic datasets offer a scalable alternative, but models trained on synthetic images often suffer performance degradation when transferred to real domains. We present the first systematic study on synthetic to real domain adaptation for semantic segmentation in NIR images in the automotive domain. We propose a generative augmentation framework that transforms synthetic images into realistic NIR-style variants via our introduced target style adaptation (TSA). TSA fine-tunes a latent diffusion model via low-rank adaptation on a small curated set of real NIR images and applies it to synthetic training data using structure-preserving multi-signal conditioning. To reduce texture bias and improve segmentation robustness, we further apply a Voronoi-based style diversification strategy (VSD) that modifies the original textures while preserving scene geometry. Experiments with multiple model architectures on NIR data from vehicle interiors and street scenes show that balancing inductive bias during training leads to noticeably more robust semantic segmentation and effectively reduces the domain gap in our real-world scenarios by up to 63.6% on exterior and 28.4% on interior data. The code is available at GitHub.

翻译：语义分割是现代汽车系统中视觉感知的基础组件，可实现像素级场景理解。近红外成像（NIR）能够在光照条件不佳的情况下提供稳定的检测能力，但由于缺乏高质量的真实场景标注数据，开发特定领域的语义分割模型仍具挑战性。合成数据集提供了可扩展的替代方案，但基于合成图像训练的模型在迁移至真实域时往往性能下降。我们首次针对汽车领域近红外图像中合成到真实域的语义分割自适应问题开展了系统性研究。提出一种生成式增强框架，通过引入目标风格自适应（TSA）机制将合成图像转换为逼真的近红外风格变体。TSA采用低秩自适应方法，在少量精选真实近红外图像上对潜扩散模型进行微调，并利用保持结构的多信号条件化方法将其应用于合成训练数据。为降低纹理偏差并提升分割鲁棒性，我们进一步提出基于沃罗诺伊图的风格多样化策略（VSD），在保持场景几何结构的同时修改原始纹理。在车辆内部和街景场景的近红外数据上采用多种模型架构进行的实验表明，训练过程中平衡归纳偏差可显著提升语义分割的鲁棒性，并在实际场景中将域差距有效降低：外部场景降低63.6%，内部场景降低28.4%。代码已在GitHub开源。