用户反馈驱动的视觉语言导航适应方法 (User-Feedback-Driven Adaptation for Vision-and-Language Navigation)

Real-world deployment of Vision-and-Language Navigation (VLN) agents is constrained by the scarcity of reliable supervision after offline training. While recent adaptation methods attempt to mitigate distribution shifts via environment-driven self-supervision (e.g., entropy minimization), these signals are often noisy and can cause the agent to amplify its own mistakes during long-horizon sequential decision-making. In this paper, we propose a paradigm shift that positions user feedback, specifically episode-level success confirmations and goal-level corrections, as a primary and general-purpose supervision signal for VLN. Unlike internal confidence scores, user feedback is intent-aligned and in-situ consistent, directly correcting the agent's decoupling from user instructions. To effectively leverage this supervision, we introduce a user-feedback-driven learning framework featuring a topology-aware trajectory construction pipeline. This mechanism lifts sparse, goal-level corrections into dense path-level supervision by generating feasible paths on the agent's incrementally built topological graph, enabling sample-efficient imitation learning without requiring step-by-step human demonstrations. Furthermore, we develop a persistent memory bank mechanism for warm-start initialization, supporting the reuse of previously acquired topology and cached representations across navigation sessions. Extensive experiments on the GSA-R2R benchmark demonstrate that our approach transforms sparse interaction into robust supervision, consistently outperforming environment-driven baselines while exhibiting strong adaptability across diverse instruction styles.

翻译：视觉语言导航（VLN）智能体在现实世界中的部署受到离线训练后可靠监督稀缺的限制。尽管近期的适应方法尝试通过环境驱动的自监督（例如熵最小化）来缓解分布偏移，但这些信号通常存在噪声，并可能导致智能体在长时程序贯决策过程中放大自身错误。本文提出一种范式转变，将用户反馈——特别是任务级成功确认与目标级修正——定位为VLN主要且通用的监督信号。与内部置信度分数不同，用户反馈具有意图对齐与现场一致性，可直接纠正智能体与用户指令的解耦问题。为有效利用该监督机制，我们提出一种用户反馈驱动的学习框架，其核心为拓扑感知轨迹构建流程。该机制通过在智能体增量构建的拓扑图上生成可行路径，将稀疏的目标级修正提升为稠密的路径级监督，从而在无需逐步人工演示的情况下实现样本高效的模仿学习。此外，我们开发了持久记忆库机制用于热启动初始化，支持跨导航会话复用先前获取的拓扑结构与缓存表征。在GSA-R2R基准上的大量实验表明，我们的方法能将稀疏交互转化为鲁棒的监督信号，在持续超越环境驱动基线的同时，展现出对不同指令风格的强大适应能力。