Application of Machine Learning for Runoff Prediction

Application of Machine Learning for Runoff Prediction: Comparison

Please note this is a comparison between Version 1 by Ze Liu and Version 2 by Wendy Huang.

Water resource modeling is an important means of studying the distribution, change, utilization, and management of water resources. By establishing various models, water resources can be quantitatively described and predicted, providing a scientific basis for water resource management, protection, and planning. Traditional hydrological observation methods, often reliant on experience and statistical methods, are time-consuming and labor-intensive, frequently resulting in predictions of limited accuracy. However, machine learning technologies enhance the efficiency and sustainability of water resource modeling by analyzing extensive hydrogeological data, thereby improving predictions and optimizing water resource utilization and allocation.

water resource
machine learning
runoff prediction
medium- and long-term runoff
short-term runoff

1. Introduction

With the escalating severity of global water shortage problems, the management and efficient utilization of water resources have become a critical research focus ^[1][2][1,2]. In China, the total volume of freshwater resources amounts to 2800 billion cubic meters, representing 6% of global water resources and ranking fourth globally, following Brazil, Russia, and Canada [3]. However, China’s per capita water resources stand at only 2300 cubic meters, equivalent to just a quarter of the global average, categorizing it among the countries with the scarcest per capita water resources globally [4]. In China, precipitation diminishes from the southeast coast to the northwest inland, categorized into five zones: rainy, humid, semi-humid, semi-arid, and arid [5]. Owing to the heterogeneous distribution of precipitation across regions, a pronounced supply–demand imbalance of water resources exists in China, particularly in the northern regions [6]. With population growth and economic advancement, the demand for water resources steadily increases, while the supply remains limited, leading to water shortages in some areas [7]. In some regions, water resource utilization efficiency in China is noticeably low, primarily due to inadequate focus on water resources, insufficient effective water-saving measures, and a lack of institutional guarantees [8]. Furthermore, with the accelerated pace of industrialization and urbanization, substantial quantities of industrial and agricultural wastewater, domestic sewage, and solid waste are persistently discharged, leading to severe pollution of rivers, lakes, and groundwater [9]. Consequently, there is a critical need to enhance the protection and utilization of water resources and to implement effective measures aimed at improving water resource utilization efficiency [10].

Water resources constitute renewable resources and demonstrate variability in annual and interannual quantities, possessing a distinct cycle and pattern (Figure 1). Influenced by solar radiation, Earth’s gravity, and other physical processes, including evapotranspiration, precipitation, soil infiltration, surface runoff, and underground flow, water translocates from one location to another [11]. Water resources form a critical foundation for the survival and development of human society [12]. To enhance the management and utilization of water resources, various computational models have been employed in water resource management [13]. The traditional hydrological model operates on the principle of the hydrological cycle, articulating the components of the hydrological cycle system [14]. These models hold significant physical importance and are straightforward to elucidate; however, their development process is complex and requires extensive expertise from developers [15]. Machine learning can process massive amounts of data, extract valuable information, and automatically construct models to predict future trends [16]. In water resource management, such data enable a more comprehensive consideration of problems and the establishment of improved decision-making models [17]. Consequently, machine learning, owing to its robust data mining capabilities, is widely applied in areas like water resource supply and demand prediction, flood risk management, water quality monitoring, and forecasting scenarios ^{[18][19][20][21][22]}[18,19,20,21,22].

Figure 1.

Schematic diagram of water resource cycle.

Forecasting runoff plays a crucial role in optimizing water resource systems and mitigating the impacts of destructive natural disasters, such as floods and droughts, through both long-term planning and short-term emergency warnings ^[23][24][25][84,85,86]. Due to the complexity of the causes of runoff and the challenging nature of understanding their mechanisms, constructing machine learning-based models for runoff prediction emerges as an effective solution ^[26][87]. When classified by time scale, runoff prediction can be segmented into medium- and long-term predictions, as well as short-term forecasting ^[27][88].

2. Medium- and Long-Term Runoff Prediction

Prediction of medium- and long-term runoff is vital for developing water resource scheduling plans that span extended durations, thereby significantly impacting water resource management ^[28][89]. Ghumman et al. developed an ANN-based runoff model using continuously measured monthly rainfall and runoff data, applying it to predict monthly runoff in the Hoab River Basin of Pakistan ^[29][90]. ANNs do not require a detailed investigation of hydrological and geological parameters of the catchment to perform similarly to traditional conceptual models. Nevertheless, the dataset frequently contains a high proportion of noise and errors, which can impede the ability of ANN-based models to make efficient and accurate predictions. Owing to climate change and human activities, natural runoff often contains multiple frequency components, posing a challenge for traditional ANN-based models in efficiently capturing the underlying change processes.

Tan et al. employed multi-year runoff data and introduced an ANN-based runoff model combined with ensemble empirical modal decomposition (EEMD). The results demonstrate that the proposed EEMD-ANN model significantly enhances the accuracy of the ANN-based method in predicting the medium- and long-term runoff time series ^[30][91]. Liao proposed a hybrid framework for long-term runoff prediction, incorporating pre-inflow and specific meteorological factors, such as precipitation, evapotranspiration, solar radiation, soil temperature, etc., as input features. The results showed that the accuracy is improved by combining EEMD and ANN for modeling ^[31][92]. The challenge in medium- and long-term runoff prediction lies in its low accuracy, arising from the extended prediction period and the complexity of the runoff genesis mechanism. Consequently, identifying key factors is of critical importance. Han et al. introduced a LSTM-based model, AT-LSTM, combining double attention mechanisms in the input and hidden layers. The model uses rainfall, potential evapotranspiration, and monthly climate phenomenon index data (including 88 atmospheric circulation indicators, 26 sea surface temperature indicators, and 16 other indicators) as inputs for long-term runoff prediction. The results demonstrate that the AT-LSTM model effectively enhances the accuracy of long-term prediction and identifies the dynamic effects of the input factors ^[32][93].

3. Short Term Runoff Prediction

Zealand et al. examined the efficacy of ANNs for short-term runoff prediction, and the results demonstrating that the ANN-based models consistently outperform traditional models ^[33][94]. Kratzert utilized meteorological forcing data (maximum air temperature, precipitation, radiation, vapor pressure, etc.) and static catchment characteristics (drought index, PET mean, max water content, geological permeability, forest fraction, etc.) as features to build an LSTM-based model for daily runoff prediction. The obtained results surpassed those of the well-established physical model ^[23][84]. Based on data from 98 rainfall runoff events, Hu et al. compared the performance of the ANN-based and LSTM-based models for simulating runoff processes during runoff events over a delivery period of 1 to 6 h. The results showed that the LSTM-based model outperformed the ANN-based models ^[34][95]. Gao et al. developed a short-term runoff prediction model using LSTM and gated recurrent unit (GRU) networks, based on hourly flow measurements from one runoff station and hourly rainfall data from four rainfall stations ^[35][96]. The experiments demonstrated that the GRU model requires the least training time and has a simpler structure, making it the preferred method for short-term runoff prediction.

Despite achieving high accuracy in runoff prediction, no model can guarantee maximum certainty in predictions due to issues like noisy or incomplete data. To improve prediction accuracy and reduce data dependency, Naganna et al. utilized the runoff time series of the Gauvery River in India. They applied deep learning techniques (CNN, RF, and Gradient Tree Boosting (GTB)), incorporating the Information Criterion (AIC) and Bayesian Information Criterion (BIC) to select the ideal input parameters for predicting the daily scale of multiple basins in the Gauvery River in India. The results indicated that deep learning (CNN), combined with AIC and BIC to select the ideal inputs to the model, achieves excellent prediction accuracy ^[36][97].

In summary, several features influence runoff, with rainfall, evapotranspiration, temperature, and radiation being the most crucial factors. For medium- and long-term runoff prediction, the best model is the AT-LSTM model combining double attention mechanisms, while for short-term runoff prediction, the CNN-based model combining AIC and BIC is the most effective. Presently, global climate change has a series of impacts on the hydrological cycle, subsequently influencing the hydrological processes in the watershed. The non-stationary hydrological sequences, influenced by factors like climatic and meteorological conditions, subsurface conditions, and human activities, introduce new challenges to hydrological forecasting and other research work ^[37][98]. Under non-stationary conditions, traditional runoff forecasting methods cannot be directly applied. In the future, it is necessary to explore the physical mechanism, integrate machine learning methods, and incorporate hydrological–meteorological information to establish a runoff prediction model suitable for non-stationary conditions.