Similar to solar energy, the prediction of wind energy poses a challenge due to its nonlinearity and randomness, which results in inconsistent power generation. Consequently, there is a need for an effective model to forecast wind energy, as evidenced by research studies
[24][25]. In light of the rising global population and increasing energy demand, wind energy is viewed as a feasible alternative to depleting fossil fuels. Offshore wind farms are particularly advantageous compared to onshore wind farms since they offer higher capacity and access to more wind sources
[26]. ML and DL models and algorithms are employed in wind energy development, utilizing wind speed data and other relevant information. Various researchers have proposed different models to increase prediction accuracy. For example, Zendehboud et al. suggested the SVM model as superior to other models and introduced hybrid SVM models
[27]. Wang et al. proposed an HM comprising a combination of models for short-term wind speed prediction
[28]. Demolli et al. used five ML algorithms to predict long-term wind power, finding that the SVR algorithm is most effective when the standard deviation is removed from the dataset
[29]. Xiaoetal suggested using a self-adaptive kernel extreme learning machine (KELM) as a means to enhance the precision of forecasting
[30]. The ARIMA and nonlinear autoregressive exogenous (NARX) models were evaluated by Cadenas et al., who concluded that the NARX model had less error
[31]. Wind power and speed were predicted in other studies using a variety of models, including the improved dragonfly algorithm (IDA) with SVM (IDA–SVM) model, local mean decomposition (LMD), firefly algorithm (FA) models, and the CNN model
[15][32][33].
ML and DL have significantly advanced the field of forecasting renewable energy. However, there are still several issues that need to be resolved. For instance, the choice of ML and DL algorithms, the selection of input data, and the handling of missing data are essential factors that affect the precision of forecasting models for renewable energy. Additionally, there is a need to develop robust and interpretable models that can provide insights into the factors that influence renewable energy generation.
2. Machine Learning-Based Forecasting of Renewable Energy
2.1. Supervised Learning
ML is a subset of artificial intelligence that seeks to enable machines to learn from data and improve their ability to perform a particular task
[34][35]. The process involves developing statistical models and algorithms that enable computers to identify patterns in data and utilize them to make decisions or predictions. In essence, ML involves teaching a computer to identify and react to specific types of data by presenting it with extensive examples, known as “training data.” This training procedure helps the computer identify patterns and make predictions or decisions based on fresh data that it has not encountered previously
[36][37][38]. The applications of ML span diverse industries such as healthcare, finance, e-commerce, and others
[39][40][41][42][43][44]. In addition, ML techniques can be leveraged for predicting renewable energy generation, resulting in better management of renewable energy systems with improved efficiency and effectiveness. There are multiple ML algorithms available, each with distinct strengths and weaknesses. The algorithms can be categorized into three primary groups: supervised learning, unsupervised learning, and reinforcement learning
[45].
Supervised learning refers to a ML method that involves training a model using data that has been labeled. The labeled data comprises input-output pairs, where the input is the data on which the model is trained and the output is the expected outcome
[46][47]. The model learns to map inputs to outputs by reducing the error between the predicted and actual outputs during training. Once trained, the model can be applied to generate predictions on new, unlabeled data
[48][49]. Regression and classification are the two basic sub-types of supervised learning algorithms (
Figure 1)
[46].
Figure 1. ML types and algorithms.
Table 1 presents a comparative analysis of various ML and DL algorithms, outlining their respective advantages and disadvantages across different applications.
Table 1. ML and DL Technics with pros and cons in different applications.
1. Regression: Regression is a supervised learning approach that forecasts a continuous output variable based on one or more input variables. Regression aims to identify a mathematical function that can correlate the input variables to a continuous output variable, which may represent a single value or a range of values
[50]. Linear regression, polynomial regression, and support vector regression (SVR) are the three main types of supervised learning algorithms in regression
[51].
Linear and Polynomial Regression: Linear regression is a prevalent and straightforward approach used to forecast a continuous output variable utilizing one or more input variables. It uses a straight line to indicate the correlation between the input variables and the output variables
[52]. On the other hand, polynomial regression, a type of linear regression, employs nth-degree polynomial functions to depict the connection between input features and the outcome variable
[53]. This can enhance the accuracy of predictions by enabling the model to capture more intricate correlations between the input data and the target variable. In renewable energy forecasting, both linear and polynomial regression can be used to predict the power output of RES such as solar and wind power
[54][55]. Weather information such as temperature, humidity, and wind speed are frequently included in the input characteristics, along with historical power output data. The target variable is the power output of the renewable energy source, which can be predicted using the input features.
For instance, Ibrahim et al. (2012) used data from a weather station collected over three years to create a linear regression model to predict solar radiation in Perlis. The model used three input variables (average daily maximum and minimum temperatures, as well as the average daily solar radiation) and had a good fit with an R-squared value of 0.954. The authors concluded that their model could be a useful tool for estimating solar radiation in Perlis
[56]. Ekanayake et al. (2021) created artificial neural networks (ANNs), multiple linear regression, and power regression models to produce wind power prediction models for a Sri Lankan wind farm. In their modeling approach, they utilized climate parameters such as average wind speed and average ambient temperature as input variables for their analysis. The models were developed using power generation data over five years and showed acceptable accuracy with low RMSE, low bias, and a high correlation coefficient. The ANN model was the most precise, but the MLR and PR models provide insights for additional wind farms in the same area
[57]. Mustafa et al. (2022) also compared four regression models, linear regression, logistic regression, Lasso regression, and elastic regression, for solar power prediction. The results showed that all four models are effective, but the elastic regression outperformed the others in predicting maximum solar power output. Principal component analysis (PCA) was also applied, showing improved results in the elastic regression model. The entry focuses on the strengths and weaknesses of each solar power prediction model
[58].
Support Vector Regression (SVR): The SVR algorithm is utilized in regression analysis within the field of ML
[59]. It works by finding the best hyperplane that can separate the data points in a high-dimensional space. The selection of the hyperplane aims to maximize the distance between the closest data points on each side of it. The approach involves constraining the margin while minimizing the discrepancy between the predicted and actual values. It is also a powerful model to predict renewable energy potential for a specific location. For example, Yuan et al. (2022) proposed a jellyfish search algorithm optimization SVR (IJS-SVR) model to predict wind power output and address grid connection and power dispatching issues. The SVR was optimized using the IJS technique, and the model was tested in both the spring and winter. IJS-SVR outperformed other models in both seasons, providing an effective and economical method for wind power prediction
[60]. In addition, Li et al. (2022) created ML-based algorithms for short-term solar irradiance prediction, incorporating a hidden Markov model and SVM regression techniques. The Bureau of Meteorology demonstrated that their algorithms can effectively forecast solar irradiance for 5–30 min intervals in various weather conditions
[61]. Mwende et al. (2022) developed SVR and random forest regression (RFR) models for real-time photovoltaic (PV) power output forecasting. On the validation dataset, SVR performed better than RFR with an RMSE of 43.16, an adjusted R
2 of 0.97, and a MAE of 32.57, in contrast to RFR’s RMSE of 86, an adjusted R
2 of 0.90, and a MAE of 69
[62].
2. Classification: Classification, a form of supervised learning, involves using one or more input variables to anticipate a categorical output variable
[63]. Classification aims to find a function that can map the input variables to discrete output categories. The most widely used classification algorithms for predicting RES include logistic regression, decision trees, random forests, and support vector machines.
Logistic Regression: Logistic regression is a classification method that utilizes one or more input variables to forecast a binary output variable
[64][65]. It models the probability of the output variable being true or false using a sigmoid function. In renewable energy forecasting, logistic regression can be used to predict whether or not a specific event will occur, such as a solar or wind farm reaching a certain level of power output. For instance, Jagadeesh et al. used ML to develop a forecasting method for solar power output in 2020. They used a logistic regression model with data from 11 months, including plant output, solar radiation, and local temperature. The study found that selecting the appropriate solar variables is essential for precise forecasting. Additionally, it examined the algorithm’s precision and the likelihood of a facility generating electricity on a particular day in the future
[66].
Decision Trees: An alternative classification method is decision trees, which involve dividing the input space into smaller sections based on input variable values and then assigning a label or value to each of these sections
[65]. The different studies developed decision tree models to forecast power output from different renewable energy systems. Essama et al. (2018) developed a model to predict the power output of a photovoltaic (PV) system in Cocoa, Florida, USA, using weather parameters obtained from the United States National Renewable Energy Laboratory (NREL). By selecting the best performance among the ANN, RF, DT, extreme gradient boosting (XGB), and LSTM algorithms, they aim to fill a research gap in the area. They have concluded that even if all of the algorithms were good, ANN is the most accurate method for forecasting PV solar power generation.
Random Forest: An effective and reliable prediction is produced by the supervised ML method known as random forest, which creates several decision trees and merges them
[67]. The bagging technique, which is employed by random forests, reduces the variance of the base algorithms. This technique is particularly useful for forecasting time-series data
[68]. Random forest mitigation correlation between trees by introducing randomization in two ways: sampling from the training set and selecting from the feature subset. The RF model creates a complete binary tree for each of the N trees in isolation, which enables parallel processing.
Vassallo et al. (2020) investigate optimal strategies for random forest (RF) modeling in wind speed/power forecasting. The investigation examines the utilization of random forest (RF) as a corrective measure, comparing direct versus recursive multi-step prediction, and assessing the impact of training data availability. Findings indicate that RF is more efficient when deployed as an error-correction tool for the persistence approach and that the direct forecasting strategy performs slightly better than the recursive strategy. Increased data availability continually improves forecasting accuracy
[69]. In addition, Shi et al. (2018) put forward a two-stage feature selection process, coupled with a supervised random forest model, to address overfitting, weak reasoning, and generalization in neural network models when forecasting short-term wind power. The proposed methodology removes redundant features, selects relevant samples, and evaluates the performance of each decision tree. To address the inadequacies of the internal validation index, a new external validation index correlated with wind speed is introduced. Simulation examples and case studies demonstrate the model’s better performance than other models in accuracy, efficiency, and robustness, especially for noisy data and wind power curtailment
[70]. Similarly, Natarajan and Kumar (2015) also compared wind power forecasting methods. Physical methods rely on meteorological data and numerical weather prediction (NWP), while statistical methods such as ANN and SVM depend on historical wind speed data. This research experiments with the random forest algorithm, finding it more accurate than ANN for predicting wind power at wind farms
[71].
Support Vector Machines (SVM): SVM are a type of classification algorithm that identifies a hyperplane and maximizes the margin between the hyperplane and the data points, akin to SVR
[72][73]. SVM has been utilized in renewable energy forecasting to estimate the power output of wind and solar farms by incorporating input features such as historical power output, weather data, and time of day. For instance, Zeng et al. (2022) propose a 2D least-squares SVM (LS-SVM) model for short-term solar power prediction. The model uses atmospheric transmissivity and meteorological variables and outperforms the reference autoregressive model and radial basis function neural network model in terms of prediction accuracy
[74]. R. Meenal and A. I. Selvakumar (2018) conducted studies comparing the accuracy of SVM, ANN, and empirical solar radiation models in forecasting monthly mean daily global solar radiation (GSR) in several Indian cities using varying input parameters. Using WEKA software, the authors determine the most significant parameters and conclude that the SVM model with the most influential input parameters yields superior performance compared to the other models
[75]. Generally, classification algorithms are used to predict categorical output variables, and regression techniques are used to predict continuous output variables. The particular task at hand and the properties of the data will determine which method is used.
2.2. Unsupervised Learning
Another form of ML is unsupervised learning, where an algorithm is trained on an unlabeled dataset lacking known output variables, to uncover patterns, structures, or relationships within the data
[76][77][78]. Unsupervised learning algorithms can be primarily classified into two types, namely clustering and dimensionality reduction
[79].
Clustering: It is an unsupervised learning method that consists of clustering related data points depending on how close or similar they are to one another. Clustering algorithms, such as K-means clustering, hierarchical clustering, and density-based clustering, are commonly used in energy systems to identify natural groupings or clusters within the data. The primary objective of clustering is to discover these inherent patterns, or clusters
[76][77]. K-means clustering is a widely used approach for dividing data into k clusters, where k is a user-defined number. Each data point is assigned to the nearest cluster centroid by the algorithm, and the centroids are updated over time using the average of the data points in the cluster
[76][77]. Hierarchical clustering is also a family of algorithms that recursively merge or split clusters based on their similarity or distance to create a hierarchical tree-like structure of clusters. The other family of clustering algorithms that groups data points that are within a certain density threshold and separates them from areas with lower densities is the density-based clustering algorithm
[76][77][78].
Dimensionality Reduction: It is also an unsupervised learning technique utilized to decrease the quantity of input variables or features while retaining the significant information or structure in the data
[76][77][78]. The purpose of dimensionality reduction is to find a lower-dimensional representation of data that captures the majority of the variation or variance in the data. Principal component analysis (PCA), t-SNE, and autoencoders are some dimensionality reduction algorithms used in renewable energy forecasting
[79]. Principal component analysis (PCA) is a commonly utilized method for decreasing the dimensionality of a dataset. It does so by identifying the primary components or directions that have the most variability in the data and then mapping the data onto these components
[79]. t-SNE is a non-linear dimensionality reduction algorithm that is particularly useful for visualizing high-dimensional data in low-dimensional space. It uses a probabilistic approach to map similar data points to nearby points in low-dimensional space. Autoencoders are a type of neural network that can learn to encode and decode high-dimensional data in a lower-dimensional space. The encoder network is trained to condense the input data into a representation with fewer dimensions, and the decoder network is trained to reconstruct the original data from this condensed representation
[79].
In general, unsupervised learning algorithms are particularly useful when there is a large amount of unstructured data that needs to be analyzed and when it is not clear what the specific target variable should be. Unsupervised learning has found various applications in the field of renewable energy forecasting, and one of its commonly used applications is the clustering of meteorological data
[80]. For example, in a study by J. Varanasi and M. Tripathi (2019), K-means clustering was used to group days of the year, sunny days, cloudy days, and rainy days into clusters based on similarity for short-term PV power generation forecasting
[81]. The resulting clusters were then used to train separate ML models for each cluster, which resulted in improved PV power forecasting accuracy. Unsupervised learning has also been used for anomaly detection in renewable energy forecasting. Anomaly detection refers to the task of pinpointing data points that exhibit notable deviations from the remaining dataset. In the context of renewable energy forecasting, anomaly detection can aid in identifying exceptional weather patterns or uncommon circumstances that may impact renewable energy generation. For example, in a study by Xu et al. (2015), the K-means algorithm was used to identify anomalous wind power output data, which was then employed to improve the accuracy of the wind power forecasting model
[82].
In the realm of renewable energy forecasting, unsupervised learning has been utilized for feature selection, which involves choosing a smaller set of pertinent features from a larger set of input variables. In renewable energy forecasting, feature selection can be used to reduce the computational complexity of ML models and improve the accuracy of renewable energy output predictions. For example, in a study by Scolari et al. (2015), K-means clustering was used to identify a representative subset of features for predicting solar power output
[83].
Overall, unsupervised learning is a powerful tool for analyzing large amounts of unstructured data in renewable energy forecasting. Clustering, anomaly detection, and feature selection are just a few of the many applications of unsupervised learning in this field, and new techniques are continually being developed to address the unique challenges of renewable energy forecasting.
2.3. Reinforcement Learning Algorithms
Reinforcement learning (RL) is a branch of ML in which an agent learns to make decisions in an environment to maximize a cumulative reward signal
[84][85]. The agent interacts with its surroundings by taking actions and receiving responses in the form of rewards or penalties that are contingent on its actions.
[86]. Some examples of RL algorithms are Q-learning, policy gradient, and actor-critic
[87][88]. Q-learning is a RL algorithm used for learning optimal policies for decision-making tasks by iteratively updating the Q-values, which represent the expected future rewards for each action in each state
[89]. Policy gradient is also a RL algorithm used for learning policies directly without computing the Q-values
[90]. Actor-critic is another RL algorithm that combines elements of both value-based and policy-based methods by training an actor network to generate actions and a critic network to estimate the value of those actions
[90].
Renewable energy forecasting is among the many tasks for which RL has been utilized
[88][91]. One approach to applying RL to renewable energy forecasting is to use it to control the operation of energy systems
[92]. For example, Sierra-García J. and S. Matilde (2020) developed an advanced yaw control strategy for wind turbines based on RL
[93]. This approach uses a particle swarm optimization (PSO) and Pareto optimal front (PoF)-based algorithm to find optimal actions that balance power gain and mechanical loads, while the RL algorithm maximizes power generation and minimizes mechanical loads using an ANN. The strategy was validated with real wind data from Salt Lake City, Utah, and the NREL 5-MW reference wind turbine through FAST simulations
[93].
2.4. Deep Learning (DL)
DL is a type of ML that employs ANNs containing numerous layers to acquire intricate data representations with multiple layers of abstraction. The term “deep” refers to the large number of layers in these ANNs, which can range from a few layers to hundreds or even thousands of layers
[94]. DL algorithms can learn to recognize patterns and relationships in data through a process known as “training.” During training, the weights of the links between neurons in an ANN are changed to reduce the disparity between the anticipated and actual output
[95]. DL has brought about significant transformations in several domains, such as energy systems, computer vision, natural language processing, speech recognition, and autonomous systems. It has facilitated remarkable advancements in various fields, such as natural language processing, game playing, speech recognition, and image recognition
[80].