Air traffic control (ATC) is a complicated and time-varying system, in which operational safety is always a hot research topic. All achievements of an ATC center can be vetoed without any hesitation if any safety incident occurs. Air traffic safety is affected by various aspects of air traffic operation, from mechanical maintenance, resource management, to air traffic control. The safety of air traffic control is particularly important since the aircraft is already in the air. There is no doubt that any effort deserves to be made to improve ATC safety.
Air traffic is an extension of ground transportation, in which the aircraft flies in the three-dimensional (3D) earth space. Since no signs and traffic signals can be designed to provide required guidance for flights in the air, the pilot is almost “blind” once the aircraft has taken off, with few approaches to obtain traffic situation around the aircraft. Considering this issue, a position, called air traffic control, is established to ensure flight safety in a local airspace area. Various infrastructures were developed to collect the global air traffic situation (radar) and then transmit the information between air and ground (communication). Based on the real-time traffic situations and a set of well-designed ATC rules, the air traffic controller (ATCO) is able to direct the flight to their destination in a safe and highly efficient manner.
Although enormous efforts have been made to build a qualified controller pilot data link communication (CPDLC) [1], digital data transmission is still a dilemma of the communication for air traffic control. In the current ATC procedure, speech communication with radio transmission is still the primary way to exchange information between the ATCO and aircrew. Therefore, the spoken instruction is transmitted in an analog manner and can be easily impacted by environmental factors, such as communication conditions, equipment error, etc. The spoken instruction contains a wealth of contextual situational dynamics that indicates the evolutions of the flight and traffic in the future [2][3], which is highly significant to the air traffic operation.
However, speech communication is also a typical human-in-the-loop (HITL) procedure in the ATC loop, since the current ATC system fails to process the speech signal directly. Any speech error may cause communication misunderstanding between the ATCO and aircrew [4][5]. As a first step of performing an ATC instruction, the communication misunderstanding likely results in incorrect aircraft motion states and further induces a potential conflict (safety risk) during the air traffic operation. Based on the statistics released by EUROCONTROL [6], up to 30% of all incidents related to speech communication errors (rising to 50% in airport environments) and 40% of all runway incursions also involve communication problems. Consequently, understanding the spoken instruction is particularly significant to detect the potential risk and further improve the ATC safety.
The main purpose of understanding spoken instruction is to obtain the near-future traffic dynamics in advance and further to detect the communication errors that may cause potential safety risks. It not only enhances the information source of the current ATC system but also is capable of providing reliable warnings before the pilot performs the incorrect instruction (with more prewarning time).
As shown in Figure 1, the upper part presents the typical ATC communication procedure, while the lower part illustrates the required spoken instruction understanding (SIU) task in the ATC domain. In general, the SIU mainly consists of two steps: automatic speech recognition (ASR) and language understanding (LU) [7][8], as described below:
Figure 1. The air traffic control (ATC) procedure and spoken instruction understanding (SIU) roles in ATC.
(a) ASR: translates the ATCO’s instruction from speech signal into text representation (human- or computer-readable). The ASR technique concerns the acoustic model, language model, or other contextual information.
(b) LU: also known as text instruction understanding, with the goal to extract ATC-related elements from the text instruction since the ATC system cannot process the text directly, i.e., from text to an ATC-related structured data. The ATC elements are further applied to improve the operational safety of air traffic. In general, the LU task can be divided into three parts: role recognition, intent detection, and slot filling (ATC-related element extraction, such as aircraft identity, altitude, etc.).
In addition, since the ATC communication is a multi-speaker and multi-turn conversation system, to support the correlation among different instructions in the same sector, voiceprint recognition (VPR) is also needed to distinguish the identity of different speakers for the LU task. The VPR technique can also be applied for security purposes. For instance, if an ATCO instruction for a certain flight A is incorrectly responded to by the aircrew of flight B (usually the similar aircraft identity), the potential risks may be raised due to the mismatched traffic dynamics. In this way, the VPR technique is expected to be applied to detect this emergency situation from the perspective of the vocal feature of different speakers and further prevent the potential flight conflict (improve operational safety).
All the time, new techniques failed to be applied to the ATC domain promptly due to the various limitations (safety, complex environment, etc.). Although enormous academic studies for speech instruction have been reported in the ATC domain [9][10][11][12][13], currently, there is no valid processing devoted to speech instruction in a real industrial ATC system. The only contribution of speech communication is regarded as the evidence of the post-event analysis, which cannot present its important role in improving air traffic safety. Fortunately, thanks to a large amount of available industrial data storage and widespread applications of information technology, it is possible to obtain extra real-time traffic information from speech communication and further make contributions to the air traffic operation.
Based on the aforementioned technique challenges and exiting works, the possible research topics related to the SIU task in the future are prospected, from the perspective of automatic speech recognition, language understanding, and voiceprint recognition, as summarized below: