Assessing scenario coverage is crucial for evaluating the robustness of autonomous agents, yet existing methods rely on expensive human annotations or computationally intensive Large Vision-Language Models (LVLMs). These approaches are impractical for large-scale deployment due to cost and efficiency constraints. To address these shortcomings, we propose SCOUT (Scenario Coverage Oversight and Understanding Tool), a lightweight surrogate model designed to predict scenario coverage labels directly from an agent's latent sensor representations. SCOUT is trained through a distillation process, learning to approximate LVLM-generated coverage labels while eliminating the need for continuous LVLM inference or human annotation. By leveraging precomputed perception features, SCOUT avoids redundant computations and enables fast, scalable scenario coverage estimation. We evaluate our method across a large dataset of real-life autonomous navigation scenarios, demonstrating that it maintains high accuracy while significantly reducing computational cost. Our results show that SCOUT provides an effective and practical alternative for large-scale coverage analysis. While its performance depends on the quality of LVLM-generated training labels, SCOUT represents a major step toward efficient scenario coverage oversight in autonomous systems.
翻译:场景覆盖评估对于评估自动驾驶智能体的鲁棒性至关重要,然而现有方法依赖于昂贵的人工标注或计算密集型的大型视觉语言模型(LVLM)。由于成本和效率限制,这些方法在大规模部署中不切实际。为解决这些不足,我们提出了SCOUT(场景覆盖监督与理解工具),这是一种轻量级代理模型,旨在直接从智能体的潜在传感器表征中预测场景覆盖标签。SCOUT通过蒸馏过程进行训练,学习近似LVLM生成的覆盖标签,同时消除了持续LVLM推理或人工标注的需求。通过利用预计算感知特征,SCOUT避免了冗余计算,实现了快速、可扩展的场景覆盖估计。我们在大量真实自动驾驶导航场景数据集上评估了该方法,证明其在显著降低计算成本的同时保持了高精度。结果表明,SCOUT为大规模覆盖分析提供了一种有效且实用的替代方案。尽管其性能依赖于LVLM生成训练标签的质量,但SCOUT代表了自动驾驶系统高效场景覆盖监督的重要一步。