Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.
翻译:视觉导航是具身人工智能中的一个基础性问题,然而实际部署需要长时程规划能力以处理多目标任务。一个主要瓶颈是数据稀缺:从有限数据中学习到的策略常常过拟合,难以在分布外泛化。现有的基于神经网络的智能体通常增加架构复杂度,这在小样本场景下反而适得其反。本文提出NeuRO,一种集成的学习优化框架,将感知网络与下游任务级鲁棒优化紧密耦合。具体而言,NeuRO解决了该集成中的核心难题:(i)通过采用具有保形校准的部分输入凸神经网络(PICNNs),将数据稀缺下的噪声视觉预测转化为凸不确定性集合,从而直接参数化优化约束;(ii)将部分可观测性下的规划重新表述为鲁棒优化问题,实现可跨环境迁移的不确定性感知策略。在无序和顺序多目标导航任务上的大量实验表明,NeuRO实现了最先进的性能,特别是在未见环境泛化方面。因此,本研究为开发鲁棒、可泛化的自主智能体提供了重要进展。