Ground-air negotiation via speech communication is a vital prerequisite for ensuring safety and efficiency in air traffic control (ATC) operations. However, with the increase in traffic flow, incorrect instructions caused by human factors bring a great threat to ATC safety. Existing flight trajectory prediction (FTP) approaches primarily rely on the flight status of historical trajectory, leading to significant delays in the prediction of real-time maneuvering instruction, which is not conducive to conflict detection. A major reason is that spoken instructions and flight trajectories are presented in different modalities in the current air traffic control (ATC) system, bringing great challenges to considering the maneuvering instruction in the FTP tasks. In this paper, a spoken instruction-aware FTP framework, called SIA-FTP, is innovatively proposed to support high-maneuvering FTP tasks by incorporating instant spoken instruction. To address the modality gap and minimize the data requirements, a 3-stage learning paradigm is proposed to implement the SIA-FTP framework in a progressive manner, including trajectory-based FTP pretraining, intent-oriented instruction embedding learning, and multi-modal finetuning. Specifically, the FTP model and the instruction embedding with maneuvering semantics are pre-trained using volumes of well-resourced trajectory and text data in the 1st and 2nd stages. In succession, a multi-modal fusion strategy is proposed to incorporate the pre-trained instruction embedding into the FTP model and integrate the two pre-trained networks into a joint model. Finally, the joint model is finetuned using the limited trajectory-instruction data to enhance the FTP performance within maneuvering instruction scenarios. The experimental results demonstrated that the proposed framework presents an impressive performance improvement in high-maneuvering scenarios.
翻译:语音通信是确保空中交通管制(ATC)运行安全与效率的关键前提。然而,随着交通流量的增加,人为因素导致的不正确指令对空管安全构成重大威胁。现有飞行轨迹预测方法主要依赖历史轨迹的飞行状态,导致对实时机动指令的预测存在显著延迟,不利于冲突检测。其主要原因在于,当前空管系统中语音指令与飞行轨迹以不同模态呈现,使得在飞行轨迹预测任务中考虑机动指令面临巨大挑战。本文创新性地提出一种语音指令感知的飞行轨迹预测框架SIA-FTP,通过融入即时语音指令来支持高机动场景下的飞行轨迹预测任务。为解决模态差距并最小化数据需求,提出了一种三阶段学习范式,以渐进方式实现SIA-FTP框架,包括基于轨迹的飞行轨迹预测预训练、意图导向的指令嵌入学习以及多模态微调。具体而言,第一阶段利用大量资源丰富的轨迹数据预训练飞行轨迹预测模型,第二阶段利用文本数据预训练包含机动语义的指令嵌入。随后,提出多模态融合策略,将预训练的指令嵌入融入飞行轨迹预测模型,并将两个预训练网络整合为联合模型。最后,利用有限的轨迹-指令数据对联合模型进行微调,以增强机动指令场景下的飞行轨迹预测性能。实验结果表明,所提框架在高机动场景下展现了显著的性能提升。