Machine Learning Methods for Rainfall–Runoff Modelling: Comparison
Please note this is a comparison between Version 1 by Muhammad Jehanzaib and Version 2 by Catherine Yang.

Runoff plays an essential part in the hydrological cycle, as it regulates the quantity of water which flows into streams and returns surplus water into the oceans. Runoff modelling may assist in understanding, controlling, and monitoring the quality and amount of water resources. In machine learning (ML) models, the association between hydrological cycle variables and runoff is examined directly without regard for the actual processes involved. However, such ML (black-box) approaches are good enough at modelling runoff. The most widely used ML approaches in hydrologic research are K-nearest neighbor (K-NN), decision tree (DT), fuzzy rule-based systems (FRBS), ANN, deep neural networks (DNN), adaptive neuro-fuzzy inference system (ANFIS), and support vector machine (SVM), etc. Numerous researchers have utilized these ML models for rainfall–runoff analysis.

  • rainfall–runoff
  • data-driven modelling
  • hydrological models

1. Artificial Neural Network (ANN)

The ANN is a highly distributed parallel information processing model with certain performance attributes analogous to the human brain [1][105]. The structure of ANN is composed of three layers: (i) input layer, (ii) hidden layer, and (iii) output layer. The ANN networks are trained through several learning algorithms such as feed-forward back propagation (FFBP), radial basis function (RBF), and Generalized regression neural network (GRNN). In engineering applications, the FFBP is the most extensively adopted ANN for non-linear generic guesstimates [2][106]. ANN models have been utilized by many previous studies [3][4][5][6][99,100,101,107]. Wu et al. [3][99] employed a multi-layer neural network for runoff prediction (four steps ahead or 1 hour ahead) and concluded that as the number of prediction steps rises, the model’s accuracy falls. Therefore, the findings of predicting one step ahead are more accurate than the outcomes of two-step-ahead prediction. Kişi [5][101] compared four different ANN training algorithms (backpropagation, Levenberg Marquardt, cascade correlation, and conjugate gradient) in predicting short-term daily runoff and concluded that the performance of the LM algorithm is better in terms of computation time and accuracy than the other three algorithms. Jain and Kumar [6][107] proposed a hybrid ANN model by incorporating a general modelling framework and reported that the hybrid ANN model performs better than the traditional ANN. Similarly, Mutlu et al. [4][100] compared the performance of two different types of ANN models including the MLP and RBF in order to predict runoff at four distinct stations and confirmed the superiority of the MLP model over the RBF model in predicting surface runoff. The deep neural networks (DNN), convolutional neural network (CNN), long short-term memory (LSTM), and recurrent neural network (RNN) are the advanced forms of ANN, and they are also becoming common of late in rainfall–runoff modelling [7][8][9][10][96,97,98,108]. Contrarily, the ANNs and DNNs have noticeable limitations including over-fitting issues, local minima, learning rate processes, computation time, computation cost and simple manual interventions such as training. However, experts can overcome all the aforementioned difficulties and achieve high accuracy in the runoff modelling process by adjusting specific neural network settings.

2. Adaptive Neuro-Fuzzy Inference System (ANFIS)

The ANFIS is a prominent soft computing approach capable of estimating any real continuous function is a compact set to any level of precision [11][109]. The ANFIS model combines the strength of both fuzzy logic with neural networks to model uncertain situations correctly. The ANFIS is a commonly used model for runoff simulation [12][13][14][15][102,103,110,111]. El-Shafie et al. [14][110] utilized the ANFIS model for monthly runoff forecasting and compared its performance with ANN. The findings suggested that the ANFIS model was capable of forecasting inflow with high accuracy, especially in severe inflow conditions, as compared to ANN. Özger [15][111] employed the Takagi Sugeno Fuzzy Inference System (TS) to simulate runoff series. The TS rule was based on a series of linear functions for predicting runoff. The TS relationship function took into account all of the uncertainty and complexity of the suggested model, and the correlation between the observation and prediction values was found to be satisfactory. Pramanik and Panda [13][103] compared the performance of two ML methods such as ANN and ANFIS that trained on upstream flow data in order to predict downstream flow. The finding suggested that the neural network with a conjugate gradient algorithm performs better than the LM and gradient descent algorithms, while the ANFIS estimated outflow better than ANN. Sanikhani and Kisi [12][102] developed two distinct ANFIS models (ANFIS with sub-clusters [ANFISSC] and ANFIS with separated grids [ANFISGP]) for streamflow simulation at a monthly time scale. Both proposed models were utilized to predict runoff 1 month ahead, but the performance of the ANFISSC model was slightly superior to ANFISGP in predicting river flow. The widespread implementation of ANFIS for rainfall–runoff modelling is due to the fact that the fuzzy inference system can handle missing and convoluted data that characterize the runoff. Generally, it is difficult to characterize runoff precisely; an estimation approach (fuzzy set) was suggested in ANFIS to produce reasonable results in runoff modelling. Several researchers highlighted the advantages of ANFIS, which enabled them to obtain high-accuracy results for runoff modelling at various time scales.

3. Support Vector Machine (SVM)

The basic principle of SVM is to translate the original data from the input space to a higher dimension space, so the classification problem becomes easy in that feature space. In SVM, support vectors are used as selection criteria, and these support vectors produce the optimal data categorization boundaries [16][112]. Many studies have recently investigated the capability of SVM in the runoff modelling procedure. Bray and Han [17][113] highlighted the use of SVM to determine the suitable model structure and associated parameters to simulate runoff in the Bird Creek watershed. They created a flowchart for model identification in order to investigate the interaction between various model structures such as kernels (linear, sigmoidal, radial, and polynomial), scaling factors, and model parameters (cost and epsilon), and input vector composition. Li and Cheng [18][93] utilized three ML approaches, namely ANN, SVM, and an extreme learning machine (ELM), for runoff prediction for two reservoirs in China. The findings suggested that all the ML methods simulated streamflow quite efficiently, while the SVM simulated runoff with a high correlation value (0.91) in the validation stage. Similarly, He et al. [19][114] compared the performance of three ML techniques, namely ANN, ANFIS, and SVM, for modelling runoff in a semi-arid climate. Various input combinations were tested, and the most appropriate input variables were selected for streamflow modelling. The results showed that the performance of the SVM model was superior as compared to the ANFIS and ANN models. These ML techniques also have capabilities to decrease the generalized error of the model in addition to the mean square error (MSE) of the training dataset. Most of the researchers reported that the radial-based kernel function of SVM is most suitable for runoff modelling because radial-based kernel has fewer adjustment parameters as compared to polynomial and sigmoidal kernels. Using a radial kernel, the SVM model captures the situation wherein the relationship between inputs and outputs is non-linear. The SVM model is more suitable for long-term streamflow simulation in comparison to short-term streamflow simulation.
Video Production Service