Rectified Schrödinger Bridge Matching for Few-Step Visual Navigation

Visual navigation is a core challenge in Embodied AI, requiring autonomous agents to translate high-dimensional sensory observations into continuous, long-horizon action trajectories. While generative policies based on diffusion models and Schrödinger Bridges (SB) effectively capture multimodal action distributions, they require dozens of integration steps due to high-variance stochastic transport, posing a critical barrier for real-time robotic control. We propose Rectified Schrödinger Bridge Matching (RSBM), a framework that exploits a shared velocity-field structure between standard Schrödinger Bridges ($\varepsilon=1$, maximum-entropy transport) and deterministic Optimal Transport ($\varepsilon\to 0$, as in Conditional Flow Matching), controlled by a single entropic regularization parameter $\varepsilon$. We prove two key results: (1) the conditional velocity field's functional form is invariant across the entire $\varepsilon$-spectrum (Velocity Structure Invariance), enabling a single network to serve all regularization strengths; and (2) reducing $\varepsilon$ linearly decreases the conditional velocity variance, enabling more stable coarse-step ODE integration. Anchored to a learned conditional prior that shortens transport distance, RSBM operates at an intermediate $\varepsilon$ that balances multimodal coverage and path straightness. Empirically, while standard bridges require $\geq 10$ steps to converge, RSBM achieves over 94% cosine similarity and 92% success rate in merely 3 integration steps -- without distillation or multi-stage training -- substantially narrowing the gap between high-fidelity generative policies and the low-latency demands of Embodied AI.

翻译：视觉导航是具身人工智能中的核心挑战，要求自主智能体将高维感官观测转化为连续、长视界的行动轨迹。尽管基于扩散模型和薛定谔桥（SB）的生成策略能有效捕捉多模态行动分布，但由于高方差随机传输，它们需要数十个积分步骤，这成为实时机器人控制的关键障碍。我们提出修正的薛定谔桥匹配（RSBM），该框架利用标准薛定谔桥（$\varepsilon=1$，最大熵传输）与确定性最优传输（$\varepsilon\to 0$，如条件流匹配）之间的共享速度场结构，由单一熵正则化参数$\varepsilon$控制。我们证明两个关键结论：（1）条件速度场的函数形式在整个$\varepsilon$谱系上保持不变（速度结构不变性），使单一网络能够服务于所有正则化强度；（2）减小$\varepsilon$可线性降低条件速度方差，从而实现更稳定的粗步长ODE积分。RSBM以缩短传输距离的学习条件先验为基础，在平衡多模态覆盖和路径笔直性的中间$\varepsilon$下运行。实验表明，标准桥需要$\geq 10$步才能收敛，而RSBM在仅3个积分步骤中即达到超过94%的余弦相似度和92%的成功率——无需蒸馏或多阶段训练——显著缩小了高保真生成策略与具身人工智能低延迟需求之间的差距。