Accurate radio-frequency (RF) material parameters are essential for electromagnetic digital twins in 6G systems, yet gradient-based inverse ray tracing (RT) remains sensitive to initialization and costly under limited measurements. This paper proposes a vision-language-model (VLM) guided framework that accelerates and stabilizes multi-material parameter estimation in a differentiable RT (DRT) engine. A VLM parses scene images to infer material categories and maps them to quantitative priors via an ITU-R material table, yielding informed conductivity initializations. The VLM further selects informative transmitter/receiver placements that promote diverse, material-discriminative paths. Starting from these priors, the DRT performs gradient-based refinement using measured received signal strengths. Experiments in NVIDIA Sionna on indoor scenes show 2-4$\times$ faster convergence and 10-100$\times$ lower final parameter error compared with uniform or random initialization and random placement baselines, achieving sub-0.1\% mean relative error with only a few receivers. Complexity analyses indicate per-iteration time scales near-linearly with the number of materials and measurement setups, while VLM-guided placement reduces the measurements required for accurate recovery. Ablations over RT depth and ray counts confirm further accuracy gains without significant per-iteration overhead. Results demonstrate that semantic priors from VLMs effectively guide physics-based optimization for fast and reliable RF material estimation.
翻译:精确的射频材料参数对于6G系统中的电磁数字孪生至关重要,然而基于梯度的逆光线追踪方法对初始值敏感且在有限测量条件下计算成本高昂。本文提出一种基于视觉-语言模型(VLM)的框架,能够在可微光线追踪引擎中加速并稳定多材料参数估计。VLM解析场景图像以推断材料类别,并通过国际电信联盟射频材料表将其映射为定量先验知识,从而生成具有信息量的电导率初始值。VLM进一步选择能产生多样化且具有材料辨别性路径的发射/接收位置。基于这些先验,可微光线追踪利用测量接收信号强度进行梯度优化的参数精调。在NVIDIA Sionna平台上的室内场景实验中,该方法相比均匀/随机初始化及随机放置基线,实现了2-4倍的收敛速度提升和10-100倍的最终参数误差降低,仅需少量接收器即可达到低于0.1%的平均相对误差。复杂度分析表明,每轮迭代时间与材料数量和测量配置数呈近线性关系,而VLM引导的放置策略减少了精确恢复所需的测量量。对光线追踪深度和射线数的消融实验证实,该方法能在不显著增加迭代开销的情况下进一步提升精度。结果表明,VLM提供的语义先验能有效指导基于物理的优化,实现快速可靠的射频材料估计。