A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers

In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the communication and can also understand its meaning. The output is subsequently sent to a response generator system, which resembles the spoken read back that pilots give to the ATCo trainees. The overall pipeline is composed of the following submodules: (i) automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) high-level air traffic control (ATC) related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue. Our system employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models. To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools. In addition, we have developed a robust and modular system with optional submodules that can enhance the system's performance by incorporating real-time surveillance data, metadata related to exercises (such as sectors or runways), or even introducing a deliberate read-back error to train ATCo trainees to identify them. Our ASR system can reach as low as 5.5% and 15.9% word error rates (WER) on high and low-quality ATC audio. We also demonstrate that adding surveillance data into the ASR can yield callsign detection accuracy of more than 96%.

翻译：本文提出了一种新颖的虚拟模拟飞行员引擎，通过整合多种基于人工智能（AI）的先进工具，以加速空中交通管制员（ATCo）的训练。该虚拟模拟飞行员引擎接收ATCo学员的口头通信，并执行自动语音识别与理解。因此，它不仅能转录通信内容，还能理解其语义。输出随后被发送至响应生成系统，模拟飞行员对ATCo学员的口头复诵回复。整体流程由以下子模块组成：（i）自动语音识别（ASR）系统，将音频转换为词序列；（ii）高层空中交通管制（ATC）相关实体解析器，用于理解转录的语音通信；（iii）文本转语音子模块，基于对话情境生成类似飞行员的口头话语。我们的系统采用了先进的AI工具，如Wav2Vec 2.0、Conformer、BERT及Tacotron模型。据我们所知，这是首个完全基于开源ATC资源和AI工具的工作。此外，我们开发了一个稳健且模块化的系统，包含可选子模块，可通过整合实时监视数据、练习相关元数据（如扇区或跑道）甚至引入故意的复诵错误来训练ATCo学员识别错误，从而提升系统性能。我们的ASR系统在高质量和低质量ATC音频上分别达到了5.5%和15.9%的词错误率（WER）。我们还证明，将监视数据融入ASR后，呼号检测准确率可超过96%。