This paper presents the development and comparative evaluation of three voice command pipelines for controlling a Tello drone, using speech recognition and deep learning techniques. The aim is to enhance human-machine interaction by enabling intuitive voice control of drone actions. The pipelines developed include: (1) a traditional Speech-to-Text (STT) followed by a Large Language Model (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a Siamese neural network-based system. Each pipeline was evaluated based on inference time, accuracy, efficiency, and flexibility. Detailed methodologies, dataset preparation, and evaluation metrics are provided, offering a comprehensive analysis of each pipeline's strengths and applicability across different scenarios.
翻译:本文针对Tello无人机,开发并比较评估了三种基于语音识别与深度学习技术的语音指令控制管道,旨在通过实现直观的语音控制来增强人机交互。所开发的管道包括:(1)传统的语音转文本(STT)结合大型语言模型(LLM)的方法,(2)语音到功能的直接映射模型,以及(3)基于孪生神经网络的系统。每种管道均根据推理时间、准确率、效率和灵活性进行了评估。本文提供了详细的方法论、数据集准备和评估指标,全面分析了各管道在不同场景下的优势与适用性。