Benefiting from the rapid advancements in large language models (LLMs), human-drone interaction has reached unprecedented opportunities. In this paper, we propose a method that integrates a fine-tuned CodeT5 model with the Unreal Engine-based AirSim drone simulator to efficiently execute multi-task operations using natural language commands. This approach enables users to interact with simulated drones through prompts or command descriptions, allowing them to easily access and control the drone's status, significantly lowering the operational threshold. In the AirSim simulator, we can flexibly construct visually realistic dynamic environments to simulate drone applications in complex scenarios. By combining a large dataset of (natural language, program code) command-execution pairs generated by ChatGPT with developer-written drone code as training data, we fine-tune the CodeT5 to achieve automated translation from natural language to executable code for drone tasks. Experimental results demonstrate that the proposed method exhibits superior task execution efficiency and command understanding capabilities in simulated environments. In the future, we plan to extend the model functionality in a modular manner, enhancing its adaptability to complex scenarios and driving the application of drone technologies in real-world environments.
翻译:得益于大型语言模型(LLMs)的快速发展,人机交互迎来了前所未有的机遇。本文提出一种方法,将微调后的CodeT5模型与基于虚幻引擎的AirSim无人机模拟器相结合,通过自然语言指令高效执行多任务操作。该方法使用户能够通过提示或命令描述与模拟无人机进行交互,从而便捷地获取并控制无人机状态,显著降低了操作门槛。在AirSim模拟器中,我们能够灵活构建视觉逼真的动态环境,以模拟无人机在复杂场景中的应用。通过将ChatGPT生成的大规模(自然语言,程序代码)指令-执行对数据集与开发者编写的无人机代码作为训练数据,我们对CodeT5进行微调,实现了从自然语言到无人机任务可执行代码的自动转换。实验结果表明,所提方法在模拟环境中展现出卓越的任务执行效率与指令理解能力。未来,我们计划以模块化方式扩展模型功能,增强其对复杂场景的适应性,并推动无人机技术在实际环境中的应用。