Sci-VLA：面向科学实验中长周期任务的智能VLA推理插件 (Sci-VLA: Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments)

Robotic laboratories play a critical role in autonomous scientific discovery by enabling scalable, continuous experimental execution. Recent vision-language-action (VLA) models offer a promising foundation for robotic laboratories. However, scientific experiments typically involve long-horizon tasks composed of multiple atomic tasks, posing a fundamental challenge to existing VLA models. While VLA models fine-tuned for scientific tasks can reliably execute atomic experimental actions seen during training, they often fail to perform composite tasks formed by reordering and composing these known atomic actions. This limitation arises from a distributional mismatch between training-time atomic tasks and inference-time composite tasks, which prevents VLA models from executing necessary transitional operations between atomic tasks. To address this challenge, we propose an Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments. It introduces an LLM-based agentic inference mechanism that intervenes when executing sequential manipulation tasks. By performing explicit transition inference and generating transitional robotic action code, the proposed plugin guides VLA models through missing transitional steps, enabling reliable execution of composite scientific workflows without any additional training. This inference-only intervention makes our method computationally efficient, data-efficient, and well-suited for open-ended and long-horizon robotic laboratory tasks. We build 3D assets of scientific instruments and common scientific operating scenes within an existing simulation environment. In these scenes, we have verified that our method increases the average success rate per atomic task by 42\% during inference. Furthermore, we show that our method can be easily transferred from the simulation to real scientific laboratories.

翻译：机器人实验室通过实现可扩展、连续化的实验执行，在自主科学发现中发挥着关键作用。近期兴起的视觉-语言-动作（VLA）模型为机器人实验室提供了有前景的基础框架。然而，科学实验通常涉及由多个原子任务组成的长周期任务，这对现有VLA模型构成了根本性挑战。虽然针对科学任务微调的VLA模型能够可靠地执行训练中见过的原子实验动作，但在执行由这些已知原子动作重新排序和组合而成的复合任务时却常常失败。这一局限源于训练时的原子任务与推理时的复合任务之间的分布不匹配，导致VLA模型无法执行原子任务间必要的过渡操作。为应对这一挑战，我们提出一种面向科学实验中长周期任务的智能VLA推理插件。该插件引入了基于大语言模型的智能推理机制，在执行序列化操作任务时进行干预。通过执行显式的过渡推理并生成过渡性机器人动作代码，所提出的插件引导VLA模型完成缺失的过渡步骤，从而无需额外训练即可可靠执行复合科学工作流。这种仅需推理的干预方式使我们的方法具有计算高效性、数据高效性，并非常适合开放式的长周期机器人实验室任务。我们在现有仿真环境中构建了科学仪器及常见科学操作场景的3D资产。在这些场景中，我们验证了该方法在推理过程中将每个原子任务的平均成功率提升了42%。此外，我们还展示了该方法能够轻松地从仿真环境迁移到真实科学实验室。