Long-horizon precision manipulation in laboratory automation, such as pipette tip attachment and liquid transfer, requires policies that respect strict procedural logic while operating in continuous, high-dimensional state spaces. However, existing approaches struggle with reward sparsity, multi-stage structural constraints, and noisy or imperfect demonstrations, leading to inefficient exploration and unstable convergence. We propose a Keyframe-Guided Reward Generation Framework that automatically extracts kinematics-aware keyframes from demonstrations, generates stage-wise targets via a diffusion-based predictor in latent space, and constructs a geometric progress-based reward to guide online reinforcement learning. The framework integrates multi-view visual encoding, latent similarity-based progress tracking, and human-in-the-loop reinforcement fine-tuning on a Vision-Language-Action backbone to align policy optimization with the intrinsic stepwise logic of biological protocols. Across four real-world laboratory tasks, including high-precision pipette attachment and dynamic liquid transfer, our method achieves an average success rate of 82% after 40--60 minutes of online fine-tuning. Compared with HG-DAgger (42%) and Hil-ConRFT (47%), our approach demonstrates the effectiveness of structured keyframe-guided rewards in overcoming exploration bottlenecks and providing a scalable solution for high-precision, long-horizon robotic laboratory automation.
翻译:实验室自动化中的长时程精密操作(如移液器吸头安装和液体转移)要求策略在连续高维状态空间中运行时遵循严格的程序逻辑。然而,现有方法受限于奖励稀疏性、多阶段结构约束以及噪声或非完美示范数据,导致探索效率低下和收敛不稳定。我们提出了一种关键帧引导的奖励生成框架,该框架自动从示范数据中提取运动感知关键帧,通过潜在空间中基于扩散的预测器生成分阶段目标,并构建基于几何进度的奖励以指导在线强化学习。该框架集成了多视角视觉编码、基于潜在相似度的进度跟踪,以及在视觉-语言-动作骨干网络上进行人机协同强化微调,从而使策略优化与生物实验协议固有的逐步逻辑对齐。在四项真实世界实验室任务(包括高精度移液器安装和动态液体转移)中,我们的方法经过40-60分钟在线微调后平均成功率可达82%。与HG-DAgger(42%)和Hil-ConRFT(47%)相比,本方法证明了结构化关键帧引导奖励在克服探索瓶颈方面的有效性,并为高精度、长时程机器人实验室自动化提供了可扩展的解决方案。