Sensor networks provide services to a broad range of applications ranging from intelligence service surveillance to weather forecasting. While most of the sensor networks are terrestrial, Underwater Sensor Networks (USN) are an emerging area. One of the unavoidable and increasing challenges for modern USN technology is tolerating faults, i.e., accepting that hardware is imperfect, and coping with it. Fault Tolerance tends to have more impact in underwater than in terrestrial environment as the latter is generally more forgiving. Moreover, reaching the malfunctioning devices for replacement and maintenance under water is harder and more costly.
Underwater Sensor Networks (USNs) have become widespread and are being deployed in a wide range of applications ranging from harbor security to monitoring underwater pipelines and fish farms. Due to the fact that USNs often operate in an extremely harsh environment, and many of their applications are safety-critical, it is imperative to develop techniques enabling these networks to tolerate faults. Moreover, USNs face many challenges that are not present in terrestrial networks, such as virtual inapplicability of the wireless radio communication under water and limitations of the acoustic means, for example.
In the following, we present the taxonomy of the sources of faults, as well as of the Fault Tolerance tasks. The objective of describing and representing these taxonomies is to categorize the articles for the current survey.
The objective of the current section is to define a taxonomy of Fault Tolerance tasks to help categorize the identified papers. The Fault Tolerance tasks are based on more general Fault Tolerance principles from References [18][19]. Figure 62. shows the taxonomy of Fault Tolerance tasks applicable in USNs and how they affect each other. While the design and initial deployment of USNs contribute to Fault Prevention and Prediction abilities, data collecting techniques at the run-time contribute also to Fault Detection and Fault Recovery stages of the system, all of which are going to be discussed in the current paper.
The techniques under consideration can be categorized into the following groups:
The overview of fault tolerant techniques presented in the following section follows the above-described taxonomy.
Subsequent to Fault Detection, Fault Identification, and Fault Diagnosis, a fault handling stage can be entered [38] to prevent further data corruption and system deterioration. The fault handling consists of Fault Isolation, Masking, and Recovery. Fault handling can hide the fault occurrence from other components by applying Fault Masking; the key techniques for such masking are informational, time, and physical redundancy [18]. Proposed masking technique For Underwater Vehicles is Triple Modular Redundancy (TMPR) [48], which is also one of the most commonly used Fault Masking techniques. Isolating a faulty component from the others can be facilitated by using virtualization [18]. In large scale distributed systems, frozen virtual images of healthy services have been used as checkpoints [49] for rolling back in case of a fault occurrence.
Fault Recovery ensures that the fault does not propagate to visible results, for instance, by rolling back to a previous healthy state (checkpointing) or re-trying failed operations (time redundancy). Some of the techniques for Fault Recovery can be Reconfiguration, which is changing the system’s state so that the same or similar error is prevented from occurring again, and Adaptation, which is re-optimizing the system, for instance, after Reconfiguration task [19].
In Sensor Networks, different approaches for Fault Recovery have been used, that have different resource overheads, energy-efficiencies, scalabilities and network types. For both network and node Fault Recovery in wireless sensor networks, Mitra et al. (2016) [50] compares techniques, such as checkpoint-based recovery (CRAFT), agent-based recovery (ABSR), fault node recovery (FNR), cluster-based and hierarchical fault management (CHFM), and Failure Node Detection and Recovery algorithm (FNDRA). While some of those are specific to terrestrial wireless usage, some principles (e.g., checkpointing, etc.) can also be used in wired and/or underwater environments. To reduce the network bandwidth requirements, checkpoint backup can be mobile to nearby nodes [51] and used for recovering from fault situations.
In network protocols, Fault Masking and Fault Recovery are handled by error control schemes that are commonly categorized into the following three groups [1]:
The cross-layer approach benefits Fault Recovery significantly since single-layer redundancy, such as hardware redundancy and application checkpointing, have very high costs, and latency between fault occurrence and detection makes the recovery difficult [19].