This work addresses spatial question answering for service robots traversing long egocentric routes. Given a query such as "where can I find a dry cleaner on the way back home?", the system returns a metric coordinate that downstream navigation components can act on. Prior Spatial Question Answering approaches leverage retrieval-augmented agents built on closed-source models such as GPT-4o for path exploration. However, robots operating in the real world often cannot reliably depend on online closed-source models due to network instability, communication latency, and deployment cost. It creates a need for open-source based Spatial Question Answering approaches that can run onboard the robot, yet prior research in this direction remains limited. This work proposes BinTrack, a simple yet effective, fully open-source spatial-localization agent that leverages the temporal ordering of a robot's trajectory. BinTrack performs a binary search over the trajectory segments between two anchor landmarks identified from a query. It improves overall accuracy by up to 22.8% over other open-source implementations and even matches the reported closed-source model result on the global category of the SpaceLocQA benchmark, the most challenging setting that has so far required strong reasoning agents such as GPT-4o. Furthermore, its optimized inference strategy consistently yields more than a 1.5x inference speedup over previous approaches. Finally, this work releases GangnamLoop, a novel and practical multi-trip outdoor benchmark collected by deploying a real quadruped robot on public streets with the anonymization policy. It revisits the same locations under different outdoor conditions and pairs the robot's low viewpoint with the human owner's. The source codes and datasets are publicly available at https://github.com/ndb796/BinaryTracking
翻译:本工作针对沿长自我中心路线行进的服务机器人空间问答问题展开研究。当收到诸如“在回家路上哪里可以找到干洗店?”的查询时,系统将返回下游导航组件可执行的度量坐标。以往空间问答方法利用基于GPT-4o等闭源模型构建的检索增强智能体进行路径探索。然而,由于网络不稳定、通信延迟和部署成本,实际环境中运行的机器人往往无法可靠依赖在线闭源模型。这催生了能够部署于机器人本体的开源空间问答方法需求,但该方向的现有研究仍十分有限。本文提出BinTrack——一种简洁高效的完全开源空间定位智能体,其利用机器人轨迹的时间顺序特性。BinTrack在根据查询识别出的两个锚点地标之间,对轨迹片段执行二值搜索。相比其他开源实现,该方法整体准确率提升高达22.8%,甚至在SpaceLocQA基准最难的全局类别上达到闭源模型(如GPT-4o)的报道性能,该类此前需要GPT-4o等强推理智能体才能处理。此外,其优化的推理策略较以往方法始终实现超过1.5倍的推理加速。最后,本文发布GangnamLoop——通过在实际公共街道部署真实四足机器人并遵循匿名化政策采集的新型实用多行程户外基准。该基准在不同户外条件下重新访问相同位置,并配对机器人的低视角与人类主人的视角。源代码和数据集已在https://github.com/ndb796/BinaryTracking 公开。