Another model uses this last result to maximize the number of turbines by restricting each turbine’s start-up and shut-down times. Each day, a new scenario tree is constructed based on meteorological data, and the water level in the reservoir is computed. The second approach employs the median scenario determined by the black box, whereas the third method, scenario fans, allows only the tree’s root to have several child nodes.
Since most of the works focus on predicting water inflows to reservoirs, the scholar explores the application of machine learning to Cyber-Physical Systems.
3.1. Linear Regression
A Linear Regression (LR) model is a technique for predicting the type of function based on the relationship between two or more data elements. Typically, the purpose is to decrease the value of a cost function associated with each point’s distance. This is accomplished by employing a “closed-form” equation that directly computes the model parameters that best suit the model.
However, a high number of features and instances may necessitate using a gradient descent method to optimize the cost function
[17]. Despite this, these models are renowned for their simplicity and speed of computation.
A study in
[18] examines 30 years of water input over an extended time frame (one year per instance). Their objective is to create a prediction model of annual water inflows for a hydropower plant—collecting and standardizing data for each place within the dataset.
The results demonstrate that the model can estimate the trend of plants separated by rivers. When a period contains fewer than 30 years, the model is especially sensitive to outliers.
The emphasis of the work in
[19] is creating a strategy for forecasting the change in the annual hydroelectric output of federal hydroelectric plants in the United States. This method is based on the correlation between geological runoff data and the yearly hydroelectric production of 132 U.S. plants.
Monthly data collection occurred between 1989 and 2008, spanning a period of twenty-nine years. Three types of projection models are utilized: global, regional, and local. The data indicate that the correlation between runoff and hydropower production is increasing, with an overall tendency toward a dryer climate.
A seasonal runoff projection model is used by looking at each season separately. Seasons are separated by a three-month interval, starting in spring (March). Trend seems to vary a lot based on the region observed.
The linear regression data were utilized to forecast the change in production till 2039. This forecast offers fresh perspectives on annual and seasonal production that might be utilized in future endeavors.
Predicting the water level in a reservoir used by a hydroelectric plant using two different scenarios over a relatively short time horizon is the purpose of the research presented in the publication cited
[16].
In the first scenario, there is rainfall and water level. In contrast, in the second scenario, there is precipitation, water level, and water release from a power plant.
Four machine learning methods were tested in
[16]:
Boosted Decision Tree Regression,
Decision Forest Regression,
Bayesian Linear Regression, and
Neural Network Regression.
When looking at the results on a Taylor diagram and comparing them with other machine learning performance metrics (MAE, MSE, RMSE, R2, and RAE), the Bayesian Linear Regression approach produces the best overall outcomes.
The 12531 data set containing 34 years of daily water level and precipitation data was harvested from 1985 to 2019, and 82,057 h set was recorded between 2010 and 2019. The results show that all methods are suitable for water level prediction, but Bayesian Linear Regression is particularly effective for the first scenario and Boosted Decision Tree Regression for the second scenario.
The scholars in
[20] discuss the creation of many medium-term regressive models (monthly and quarterly) to anticipate the amount of energy produced by a hydroelectric plant.
Four types of regressive algorithms include power regression, multiple linear regression, Gaussian process regression, and support vector regression. The precipitation data in this set spans a period of 26 years, from 1993 to 2019, and was collected from six different locations.
The information that was utilized refers to the amount of rainfall, temperature, and evaporation that occurred from the plant reservoir. According to the findings, the Gaussian process regression model is superior to the other approaches in terms of performance.
Gaussian refers to a method that is defined by the means and standard deviation; this method does not require any parameters, is appropriate for use with small datasets, and is able to take into account the uncertainty of the predictions.
By employing this model, the scholars in
[20] could establish a connection between the weather forecast and the amount of power the plant generated. According to the scholars, monthly data are not ideal for forecasting energy generation; instead, quarterly precipitation generates the most accurate projections with a high correlation. In recent research, there appears to be an abundance of regressive algorithms, necessitating testing the same problem on each approach to derive the model.
Despite the simplicity of this type of procedure, the findings achieved with this instrument are generally of high quality. However, regressive algorithms can only supply limited information.
In general, the result of the predictions made by this class of algorithms reveals more information about the situation, allowing for a more precise examination of the projected data. On a bigger scale, linear regression algorithms appear better suited as complementary to more advanced machine learning algorithms with a broader definition of the hydropower production problem.
3.2. Random Forest
A decision tree is a type of machine learning algorithm that is able to perform tasks involving classification as well as regression. The scholars in
[17] develop a model with the help of a labelled dataset by basing their decisions on the characteristics of the input data. They are the fundamental building blocks of the Random Forest (RF) model, which is one of the most effective machine learning algorithms.
According to the scholars in
[21], in comparison to linear regression models, random forest makes use of ensemble learning by constructing a large number of distinct trees. These trees are then used to make many predictions based on an input and to provide a more accurate inference from the variables.
The bagging method is often described as the main reason as to why ensemble methods work so well in the random forest algorithm. This is done by training the decision trees of a random forest with a slightly different subset of a training set in order to obtain a different prediction each time
[22].
Observing the links between market price and inflows and labeling each feature as stochastic or deterministic in order to construct a model that classifies each occurrence as deterministic or random.
The second regression model is trained to forecast a decision heuristic for determining if a deterministic approach should be employed for the current market. The dataset was altered to obtain more accurate predictions. By comparing the performance of some data to the market price, the strategy gap function was introduced. To examine the relationships between the features in the set, a correlation matrix with Spearman’s coefficient was generated.
No feature reduction was undertaken based on the results of test models. The gradient boosting decision tree approach was employed as the random forest algorithm
[23].
The selection of the model’s hyperparameters was based on 1000 random parameter selections applied to five random sets. The performance of the models was determined by calculating the accurate classification rate based on the total number of classifications conducted, the performance gap, and the average performance of an optimal design.
3.3. Reinforcement Learning
Environment and state are defined in the context of hydropower production by the data produced by power plants and their reservoirs. A policy can be represented in a variety of ways.
In the paper by
[24], a deep reinforcement learning method is used to optimize a cascade power plant network located on the Hun River in northern China. Specifically, these use the Deep Q-Network method, introduced in
[25], as a prediction model and use a Bayesian aggregation–disaggregation technique on the three reservoirs to reduce the dimensionality of the problem.
The DRL consists of an agent with two neural networks as its brain (an action and a target network), allowing it to make decisions regarding its surroundings. The environment is represented by the dataset of hydroelectric power facilities. From 1967 to 2015, the statistics include daily precipitation and 10-day inflows for each reservoir.
After receiving information on the condition of its surroundings, the agent decides whether or not to utilize the DQN’s network capabilities. The model will mimic this activity in order to return a reward to the agent based on the divergence of the system’s needs and the amount of energy that was produced. The agent retains all of its previous states, actions, and rewards so that it can engage in continuous learning.
The DRL model is compared using three Stochastic Dynamic Programming (SDP) models. The methods using DRL are said to be better than their SDP counterparts, but few conclusions are made from this point of view, and the graphs seem to show similar results between the two methods.
The scholar makes note of the fact that DRL with memory is applicable to the real-time production problem, and that Bayesian aggregation–disaggregation appears to be appropriate to the problem of cascading tank systems. Both of these points are taken into consideration.
The paper
[26] uses DRL on a long-term horizon problem to optimize annual revenues based on water inflow and electricity prices. The reinforcement algorithm is of type actor-critic with a Q-learning algorithm.
The water level inside a reservoir symbolizes the environment, and the agent’s goal is to achieve a state of equilibrium in the reservoir’s water level to minimize the amount of water that spills out and increase the amount of money the agent makes each week.
The activity that needs to be completed by the agent is to determine, based on the current price of power on the market, what percentage of the water in the tank should be converted into energy.
The reward function for action is computed with respect to the greatest capacity of power that can be produced in proportion to the reservoir’s capacity, the electricity price on a weekly basis, and the importance connected to this price. All of these factors are taken into account.
The critic is composed of four neural networks, including a network describing the value of the state, a target network allowing a better convergence of the error backpropagation algorithm, and two Q-networks to obtain the Q value.
The decisions that the actor makes are determined by a neural network designed to represent the policies in place in the state. It is important to emphasize using RMSprop to optimize their network, which is one of the hyperparameters.
This decision was made because, compared to Adam, it has less momentum dependence when applied to non-stationary data and a constantly shifting environment. In addition, using RMSprop helps to level out the differences in learning rates and prevents an excessive investigation into a local minimum.
The model is trained on an artificial scenario set in addition to a scenario set developed using data from 2008 to 2019 on European Nordic market value data from 1958 to 2019 on Norwegian water supply, and four reservoirs with comparable meteorological conditions. The model converges after one day of training with 300,000 weeks on a processor with 3.1 GHz and 16 GB of RAM
[26].
The work demonstrates the viability of a reinforcement-based model in a minimalist hydroelectric generation problem from the perspective of the field in which hydroelectric optimization models dominate. Specifically, this is done by looking at the problem from the point of view of the hydroelectric optimization models. The scholar incorporates the option of pushing the model farther with the use of algorithms such as aggregation–disaggregation
[26].