Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation, however, sourcing human feedback is labor intensive and costly. We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems, bridging the gap between generic knowledge and domain-specific requirements while reducing cost. The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers with high compliance with the desired specifications receive higher ranks, guiding the iterative fine-tuning process. We provide quantitative evidences, primarily in autonomous driving, to demonstrate the method's effectiveness across multiple tasks. The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.
翻译:尽管预训练语言模型编码了有益于规划与控制的通用知识,但它们可能无法为特定领域的任务生成合适的控制策略。现有的微调方法通过人类反馈解决这一局限,然而,获取人类反馈既费时又昂贵。我们提出一种全自动化方法,用于微调预训练语言模型以应用于自主系统,从而弥合通用知识与领域特定需求之间的差距,同时降低成本。该方法基于自然语言任务描述的引导,从预训练模型中合成基于自动机的控制器。这些控制器可针对世界模型内独立提供的规范进行验证,世界模型可以是抽象的或从高保真模拟器中获取。与期望规范高度一致的控制器将获得更高排名,从而引导迭代微调过程。我们主要在自动驾驶领域提供了定量证据,展示了该方法在多个任务上的有效性。结果表明,控制器满足规范的百分比从60%提升至90%。