Open-world interactive object search in household environments requires understanding semantic relationships between objects and their surrounding context to guide exploration efficiently. Prior methods either rely on vision-language embeddings similarity, which does not reliably capture task-relevant relational semantics, or large language models (LLMs), which are too slow and costly for real-time deployment. We introduce SCOUT: Scene Graph-Based Exploration with Learned Utility for Open-World Interactive Object Search, a novel method that searches directly over 3D scene graphs by assigning utility scores to rooms, frontiers, and objects using relational exploration heuristics such as room-object containment and object-object co-occurrence. To make this practical without sacrificing open-vocabulary generalization, we propose an offline procedural distillation framework that extracts structured relational knowledge from LLMs into lightweight models for on-robot inference. Furthermore, we present SymSearch, a scalable symbolic benchmark for evaluating semantic reasoning in interactive object search tasks. Extensive evaluations across symbolic and simulation environments show that SCOUT outperforms embedding similarity-based methods and matches LLM-level performance while remaining computationally efficient. Finally, real-world experiments demonstrate effective transfer to physical environments, enabling open-world interactive object search under realistic sensing and navigation constraints.
翻译:开放世界交互式物体搜索在家庭环境中需要理解物体及其周围环境之间的语义关系,以高效引导探索。现有方法要么依赖视觉-语言嵌入相似性(无法可靠捕捉任务相关的关联语义),要么依赖大型语言模型(LLMs)(过于缓慢且成本高昂,无法实时部署)。我们提出SCOUT:基于场景图的探索与习得效用函数用于开放世界交互式物体搜索,这是一种新颖的方法,通过使用关联性探索启发式(如房间-物体包含关系和物体-物体共现关系)为房间、前沿区域和物体分配效用分数,直接对3D场景图进行搜索。为使此方法实用而不牺牲开放词汇泛化能力,我们提出一种离线过程式蒸馏框架,从大型语言模型中提取结构化关联知识到轻量级模型中,用于机器人端推理。此外,我们提出SymSearch,一个可扩展的符号化基准测试,用于评估交互式物体搜索任务中的语义推理能力。在符号化环境和仿真环境中的广泛评估表明,SCOUT优于基于嵌入相似性的方法,并达到与大型语言模型相当的性能,同时保持计算效率。最后,真实世界实验证明了向物理环境的有效迁移,使其能在现实传感和导航约束下实现开放世界交互式物体搜索。