Air traffic is an extension of ground transportation, in which the aircraft flies in the three-dimensional (3D) earth space. Since no signs and traffic signals can be designed to provide required guidance for flights in the air, the pilot is almost “blind” once the aircraft has taken off, with few approaches to obtain traffic situation around the aircraft. Considering this issue, a position, called air traffic control, is established to ensure flight safety in a local airspace area. Various infrastructures were developed to collect the global air traffic situation (radar) and then transmit the information between air and ground (communication). Based on the real-time traffic situations and a set of well-designed ATC rules, the air traffic controller (ATCO) is able to direct the flight to their destination in a safe and highly efficient manner.
Although enormous efforts have been made to build a qualified controller pilot data link communication (CPDLC) [1], digital data transmission is still a dilemma of the communication for air traffic control. In the current ATC procedure, speech communication with radio transmission is still the primary way to exchange information between the ATCO and aircrew. Therefore, the spoken instruction is transmitted in an analog manner and can be easily impacted by environmental factors, such as communication conditions, equipment error, etc. The spoken instruction contains a wealth of contextual situational dynamics that indicates the evolutions of the flight and traffic in the future [2][3], which is highly significant to the air traffic operation.
However, speech communication is also a typical human-in-the-loop (HITL) procedure in the ATC loop, since the current ATC system fails to process the speech signal directly. Any speech error may cause communication misunderstanding between the ATCO and aircrew [4][5]. As a first step of performing an ATC instruction, the communication misunderstanding likely results in incorrect aircraft motion states and further induces a potential conflict (safety risk) during the air traffic operation. Based on the statistics released by EUROCONTROL [6], up to 30% of all incidents related to speech communication errors (rising to 50% in airport environments) and 40% of all runway incursions also involve communication problems. Consequently, understanding the spoken instruction is particularly significant to detect the potential risk and further improve the ATC safety.
The main purpose of understanding spoken instruction is to obtain the near-future traffic dynamics in advance and further to detect the communication errors that may cause potential safety risks. It not only enhances the information source of the current ATC system but also is capable of providing reliable warnings before the pilot performs the incorrect instruction (with more prewarning time).
As shown in Figure 1, the upper part presents the typical ATC communication procedure, while the lower part illustrates the required spoken instruction understanding (SIU) task in the ATC domain. In general, the SIU mainly consists of two steps: automatic speech recognition (ASR) and language understanding (LU) [7][8], as described below:
Figure 1. The air traffic control (ATC) procedure and spoken instruction understanding (SIU) roles in ATC.
In addition, since the ATC communication is a multi-speaker and multi-turn conversation system, to support the correlation among different instructions in the same sector, voiceprint recognition (VPR) is also needed to distinguish the identity of different speakers for the LU task. The VPR technique can also be applied for security purposes. For instance, if an ATCO instruction for a certain flight A is incorrectly responded to by the aircrew of flight B (usually the similar aircraft identity), the potential risks may be raised due to the mismatched traffic dynamics. In this way, the VPR technique is expected to be applied to detect this emergency situation from the perspective of the vocal feature of different speakers and further prevent the potential flight conflict (improve operational safety).
All the time, new techniques failed to be applied to the ATC domain promptly due to the various limitations (safety, complex environment, etc.). Although enormous academic studies for speech instruction have been reported in the ATC domain [9][10][11][12][13], currently, there is no valid processing devoted to speech instruction in a real industrial ATC system. The only contribution of speech communication is regarded as the evidence of the post-event analysis, which cannot present its important role in improving air traffic safety. Fortunately, thanks to a large amount of available industrial data storage and widespread applications of information technology, it is possible to obtain extra real-time traffic information from speech communication and further make contributions to the air traffic operation.
Based on the aforementioned technique challenges and exiting works, the possible research topics related to the SIU task in the future are prospected, from the perspective of automatic speech recognition, language understanding, and voiceprint recognition, as summarized below:
This entry is adapted from the peer-reviewed paper 10.3390/aerospace8030065