Robotic assistance in scientific laboratories requires procedurally correct long-horizon manipulation, reliable execution under limited supervision, and robustness in low-demonstration regimes. Such conditions greatly challenge end-to-end vision-language-action (VLA) models, whose assumptions of recoverable errors and data-driven policy learning often break down in protocol-sensitive experiments. We propose CAPER, a framework for Constrained And ProcEdural Reasoning for robotic scientific experiments, which explicitly restricts where learning and reasoning occur in the planning and control pipeline. Rather than strengthening end-to-end policies, CAPER enforces a responsibility-separated structure: task-level reasoning generates procedurally valid action sequences under explicit constraints, mid-level multimodal grounding realizes subtasks without delegating spatial decision-making to large language models, and low-level control adapts to physical uncertainty via reinforcement learning with minimal demonstrations. By encoding procedural commitments through interpretable intermediate representations, CAPER prevents execution-time violations of experimental logic, improving controllability, robustness, and data efficiency. Experiments on a scientific workflow benchmark and a public long-horizon manipulation dataset demonstrate consistent improvements in success rate and procedural correctness, particularly in low-data and long-horizon settings.
翻译:科学实验室中的机器人辅助需要程序正确的长时程操作、有限监督下的可靠执行以及低示范条件下的鲁棒性。这些条件对端到端视觉-语言-动作(VLA)模型构成了巨大挑战,该类模型关于可恢复错误和数据驱动策略学习的假设在协议敏感的实验环境中往往失效。我们提出CAPER框架,即面向机器人科学实验的约束与流程推理框架,该框架明确限定了学习与推理在规划与控制流程中的发生位置。CAPER并非强化端到端策略,而是强制执行责任分离的结构:任务级推理在显式约束下生成程序有效的动作序列,中层多模态落地实现子任务而无需将空间决策委托给大语言模型,底层控制通过最少示范的强化学习适应物理不确定性。通过可解释的中间表示编码程序约束,CAPER防止了执行阶段对实验逻辑的违反,从而提升了可控性、鲁棒性与数据效率。在科学工作流基准测试和公开长时程操作数据集上的实验表明,该方法在成功率与程序正确性方面均取得持续改进,尤其在低数据与长时程场景中表现突出。