Millimeter-wave (mmWave) radar provides reliable perception in visually degraded indoor environments (e.g., smoke, dust, and low light), but learning-based radar perception is bottlenecked by the scarcity and cost of collecting and annotating large-scale radar datasets. We present Sim2Radar, an end-to-end framework that synthesizes training radar data directly from single-view RGB images, enabling scalable data generation without manual scene modeling. Sim2Radar reconstructs a material-aware 3D scene by combining monocular depth estimation, segmentation, and vision-language reasoning to infer object materials, then simulates mmWave propagation with a configurable physics-based ray tracer using Fresnel reflection models parameterized by ITU-R electromagnetic properties. Evaluated on real-world indoor scenes, Sim2Radar improves downstream 3D radar perception via transfer learning: pre-training a radar point-cloud object detection model on synthetic data and fine-tuning on real radar yields up to +3.7 3D AP (IoU 0.3), with gains driven primarily by improved spatial localization. These results suggest that physics-based, vision-driven radar simulation can provide effective geometric priors for radar learning and measurably improve performance under limited real-data supervision.
翻译:毫米波雷达在视觉条件退化的室内环境(如烟雾、粉尘和低光照)中提供可靠的感知能力,但基于学习的雷达感知技术受限于大规模雷达数据采集与标注的稀缺性和高昂成本。本文提出Sim2Radar,一种端到端框架,能够直接从单视角RGB图像合成用于训练的雷达数据,实现无需人工场景建模的可扩展数据生成。Sim2Radar通过结合单目深度估计、分割和视觉语言推理来推断物体材质,重建具有材质感知的三维场景,随后使用基于国际电联无线电通信部门电磁特性参数化的菲涅尔反射模型,通过可配置的基于物理的射线追踪器模拟毫米波传播。在真实室内场景上的评估表明,Sim2Radar通过迁移学习提升了下游三维雷达感知性能:在合成数据上预训练雷达点云目标检测模型并在真实雷达数据上微调,可实现最高+3.7的三维平均精度提升(交并比阈值0.3),该增益主要源于空间定位能力的改善。这些结果表明,基于物理的视觉驱动雷达仿真能够为雷达学习提供有效的几何先验,并在有限真实数据监督下显著提升性能。