Vision-Language-Model-Guided Differentiable Ray Tracing for Fast and Accurate Multi-Material RF Parameter Estimation

Accurate radio-frequency (RF) material parameters are essential for electromagnetic digital twins in 6G systems, yet gradient-based inverse ray tracing (RT) remains sensitive to initialization and costly under limited measurements. This paper proposes a vision-language-model (VLM) guided framework that accelerates and stabilizes multi-material parameter estimation in a differentiable RT (DRT) engine. A VLM parses scene images to infer material categories and maps them to quantitative priors via an ITU-R material table, yielding informed conductivity initializations. The VLM further selects informative transmitter/receiver placements that promote diverse, material-discriminative paths. Starting from these priors, the DRT performs gradient-based refinement using measured received signal strengths. Experiments in NVIDIA Sionna on indoor scenes show 2-4$\times$ faster convergence and 10-100$\times$ lower final parameter error compared with uniform or random initialization and random placement baselines, achieving sub-0.1\% mean relative error with only a few receivers. Complexity analyses indicate per-iteration time scales near-linearly with the number of materials and measurement setups, while VLM-guided placement reduces the measurements required for accurate recovery. Ablations over RT depth and ray counts confirm further accuracy gains without significant per-iteration overhead. Results demonstrate that semantic priors from VLMs effectively guide physics-based optimization for fast and reliable RF material estimation.

翻译：精确的射频材料参数对于6G系统中的电磁数字孪生至关重要，然而基于梯度的逆向射线追踪方法对初始化敏感且在有限测量条件下计算成本高昂。本文提出一种视觉语言模型引导的框架，在可微分射线追踪引擎中加速并稳定多材料参数估计。VLM通过解析场景图像推断材料类别，并借助ITU-R材料表将其映射为定量先验信息，从而生成具有物理意义的电导率初始化值。VLM进一步选择信息丰富的发射器/接收器布设位置，以促进形成多样化且具有材料区分度的传播路径。基于这些先验信息，DRT利用测量得到的接收信号强度执行基于梯度的参数优化。在NVIDIA Sionna平台上进行的室内场景实验表明：相较于均匀初始化、随机初始化及随机布设基线方法，本框架实现2-4倍的收敛速度提升和10-100倍的最终参数误差降低，仅需少量接收器即可达到低于0.1%的平均相对误差。复杂度分析表明每次迭代时间与材料数量和测量设置呈近线性关系，而VLM引导的布设策略减少了精确重建所需的测量次数。针对射线追踪深度和射线数量的消融实验证实了在不显著增加单次迭代开销的前提下可进一步提升精度。研究结果表明，来自VLM的语义先验能有效指导基于物理的优化过程，实现快速可靠的射频材料参数估计。