Autonomous aerial robots operating in GPS-denied or communication-degraded environments frequently lose access to camera metadata and telemetry, leaving onboard perception systems unable to recover the absolute metric scale of the scene. As LLM/VLM-based planners are increasingly adopted as high-level agents for embodied systems, their ability to reason about physical dimensions becomes safety-critical -- yet our experiments show that five state-of-the-art VLMs suffer from spatial scale hallucinations, with median area estimation errors exceeding 50%. We propose VANGUARD, a lightweight, deterministic Geometric Perception Skill designed as a callable tool that any LLM-based agent can invoke to recover Ground Sample Distance (GSD) from ubiquitous environmental anchors: small vehicles detected via oriented bounding boxes, whose modal pixel length is robustly estimated through kernel density estimation and converted to GSD using a pre-calibrated reference length. The tool returns both a GSD estimate and a composite confidence score, enabling the calling agent to autonomously decide whether to trust the measurement or fall back to alternative strategies. On the DOTA~v1.5 benchmark, VANGUARD achieves 6.87% median GSD error on 306~images. Integrated with SAM-based segmentation for downstream area measurement, the pipeline yields 19.7% median error on a 100-entry benchmark -- with 2.6x lower category dependence and 4x fewer catastrophic failures than the best VLM baseline -- demonstrating that equipping agents with deterministic geometric tools is essential for safe autonomous spatial reasoning.
翻译:在GPS拒止或通信降级环境中运行的自主空中机器人,常因无法获取相机元数据与遥测信息,导致机载感知系统难以恢复场景的绝对度量尺度。随着基于LLM/VLM的规划器日益成为具身系统的高层智能体,其对物理尺寸的推理能力变得至关重要——然而我们的实验表明,五种前沿VLM均存在空间尺度幻觉问题,其中位数面积估计误差超过50%。本文提出VANGUARD:一种轻量级、确定性的几何感知技能,设计为可调用工具,可供任何基于LLM的智能体调用,通过普适的环境锚点——即通过定向边界框检测的小型车辆——恢复地面采样距离(GSD)。该方法通过核密度估计稳健计算车辆模态像素长度,并利用预校准的参考长度转换为GSD。该工具同时返回GSD估计值与复合置信度评分,使调用智能体能自主决定是否采信该测量值或转用备用策略。在DOTA~v1.5基准测试中,VANGUARD在306幅图像上实现了6.87%的中位数GSD误差。结合基于SAM的分割技术进行下游面积测量,该流程在包含100条数据的基准测试中取得19.7%的中位数误差——其类别依赖性较最佳VLM基线降低2.6倍,灾难性故障减少4倍——证明为智能体配备确定性几何工具对实现安全的自主空间推理至关重要。