Recent advances in large language models (LLMs) provide robots with contextual reasoning abilities to comprehend human instructions. Yet, current LLM-enabled robots typically depend on cloud-based models or high-performance computing infrastructure, which limit their deployment on robots under unreliable internet environments or with constrained computational resources, such as UAVs and small ground vehicles. Thus, deploying fine-tuned small language models (SLMs) that support onboard deployment offers a promising alternative. This paper introduces Ro-SLM, a framework that enables reliable SLM-driven robot operation by distilling LLMs' knowledge and reasoning. Ro-SLM starts from dataset synthesis by leveraging LLMs to generate diverse task instructions, produce corresponding ground truth code with minimal human assistance, and augment instructions into real-world application scenarios. Ro-SLM is then fine-tuned with the dataset, in which LLM serves as a reward function to guide the training. Extensive experiments on UAV operation tasks demonstrate that Ro-SLM improves the performance of SLM from being incapable of supporting robotic task planning and code generation to achieving performance that approaches LLM.
翻译:近年来,大型语言模型(LLM)的进展赋予机器人理解人类指令的上下文推理能力。然而,当前基于LLM的机器人通常依赖云端模型或高性能计算基础设施,这限制了它们在不可靠网络环境或计算资源受限场景(如无人机和小型地面车辆)中的部署。因此,部署支持机载运行的微调小型语言模型(SLM)成为一种有前景的替代方案。本文提出Ro-SLM框架,通过蒸馏LLM的知识与推理能力,实现可靠的SLM驱动机器人操作。Ro-SLM首先通过数据集合成阶段:利用LLM生成多样化任务指令、在最小人工干预下生成对应的基准真值代码,并将指令增强至真实应用场景。随后,使用该数据集对Ro-SLM进行微调,其中LLM作为奖励函数指导训练过程。在无人机操作任务上的大量实验表明,Ro-SLM将SLM从无法支撑机器人任务规划与代码生成的水平,提升至接近LLM性能的表现。