Generalist robots are becoming a reality, capable of interpreting natural language instructions and executing diverse operations. However, their validation remains challenging because each task induces its own operational context and correctness specification, exceeding the assumptions of traditional validation methods. We propose a two-layer validation framework that combines abstract reasoning with concrete system falsification. At the abstract layer, situation calculus models the world and derives weakest preconditions, enabling constraint-aware combinatorial testing to systematically generate diverse, semantically valid world-task configurations with controllable coverage strength. At the concrete layer, these configurations are instantiated for simulation-based falsification with STL monitoring. Experiments on tabletop manipulation tasks show that our framework effectively uncovers failure cases in the NVIDIA GR00T controller, demonstrating its promise for validating general-purpose robot autonomy.
翻译:通用机器人正逐渐成为现实,其能够解析自然语言指令并执行多样化操作。然而,由于每个任务都会引发其特有的操作语境与正确性规范,这超出了传统验证方法的假设范围,使得其验证工作仍具挑战性。我们提出一种结合抽象推理与具体系统证伪的双层验证框架。在抽象层,情境演算对世界进行建模并推导最弱前置条件,从而支持约束感知的组合测试,以系统化生成多样化、语义有效的世界-任务配置,并具备可控的覆盖强度。在具体层,这些配置被实例化以进行基于仿真的证伪,并辅以STL监控。在桌面操作任务上的实验表明,我们的框架能有效揭示NVIDIA GR00T控制器中的故障案例,证明了其在验证通用机器人自主性方面的潜力。