Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.
翻译:视觉导航是具身人工智能中的一个基础性问题,然而实际部署需要具备长视野规划能力以处理多目标任务。一个主要瓶颈是数据稀缺:从有限数据中学得的策略常常过拟合,并在分布外泛化时失败。现有的基于神经网络的智能体通常增加架构复杂度,这在小样本场景下反而适得其反。本文提出NeuRO,一个集成的学习优化框架,将感知网络与下游任务级鲁棒优化紧密耦合。具体而言,NeuRO解决了该集成中的核心难点:(i) 它利用带有保形校准的部分输入凸神经网络(PICNNs),将数据稀缺下的噪声视觉预测转化为凸不确定性集合,这些集合直接参数化优化约束;(ii) 它将部分可观测性下的规划重新表述为一个鲁棒优化问题,从而实现能够跨环境迁移的、具备不确定性感知能力的策略。在无序和顺序多目标导航任务上的大量实验表明,NeuRO确立了最先进的性能,特别是在未见环境的泛化方面。因此,我们的工作为开发鲁棒、可泛化的自主智能体提供了重要进展。