2. Machine Learning Technologies for a Solar Plant’s System
Real-life solar plant systems have complex, nonlinear dynamics due to variations in system parameters and insolation. Thus, ML methods have been proposed to approximate this complex dynamic. The recent studies
[1,2,5,6,7,8,9,10,11,12,13][1][2][5][6][7][8][9][10][11][12][13] prove that ML technologies for a solar plant’s design, forecasting, maintenance, and control increase the effectiveness and reliability of the solar plant as compared to conventional methods. In smart sensor systems of solar plants, ML methods are crucial units to increase the quality of datasets processing the solar plant’s design, forecasting, maintenance, and control. SCADA is a control system architecture that uses sensors, programmable logic, and discrete PID controllers to control the processes of a solar plant system. The solar plant’s system includes advanced sensors. Big data from SCADA are collected 24/7. Combined with weather big data, this enables the creation of ML technologies to solve complex tasks of a solar plant’s design, forecasting, maintenance, and control.
2.1. ML Sensor System of a Solar Plant
Smart models based on ML technologies have the advantage of parallel computation through modern graphical processing units, which significantly decreases the time cost in SCADA datasets processing for solar plant design, forecasting, maintenance, and control
[12].
The reliability, accuracy, and other demanded quality parameters must be composed as the performance of an ML model. This model must be created effectively with high-quality datasets to have optimal performance
[14].
Figure 1 shows the basic life cycle of an ML sensor model. Smart model creation has two phases: data preparation (DP phase) and model creation (MC phase). They should be elaborated by the Cross-Industry Standard Process for Data Mining cycle (CRISP-DM)
[15] and Open Neural Network Exchange (ONNX) format
[16]. The CRISP-DM cycle
[15] provides a pipeline for the implementation of smart models in real-time scenarios.
Figure 1.
A basic life cycle of an ML sensor model.
The sensor data of a solar plant are compiled into raw SCADA datasets. Then, these datasets are preprocessed (
Figure 1) in a simple way (standardization or encoding). Data preparation methods include dimensionality reduction (principal component analysis (PCA)), sampling (subsampling, oversampling), transformation, encoding, feature extraction, and selection
[14].
Feature extraction is a crucial step in a smart sensor system’s creation because it provides knowledge for ML model creation
[14]. The DM methods generate features. The most relevant data are further separated into train, validating, and test datasets (
Figure 1). An ML model to solve either classification or regression tasks is trained based on a train dataset. When a smart model provides the demanded performance, its weights are frozen. The ML frameworks, which
wresearche
rs review in
Section 2.3, provide an automatic MC phase, including validating (
Figure 1). The trained ML model is deployed. If a monitored ML model does not provide optimal performance, then it is retrained based on updated datasets.
2.2. ML Methods for Smart Sensor Creation
An ML sensor model can be developed based on neural network (NN) or non-NN algorithms
[14]. The last ones include PCA, Random Forest (RF), support vector machine (SVM), and Decision Tree (DT). In contrast with non-NN methods, NN architectures can include various neurons which are specified by ONNX
[17], highly effective learning, and extracting features. A deep neural learning/network (DL/DNN), such as a recurrent neural network (RNN), convolutional neural network (CNN), and transformers, is part of the ML methods with feature learning that use multiple layers, complex connectivity architectures, and different transfer operators to automatically mine meta features from the input. NNs, such as artificial neural networks (ANNs), radial basis function neural networks (RBF-NNs), generative adversarial networks (GANs), RNNs, and CNNs have recently made major progress in practical applications of solar energy
[1].
Figure 2 shows two NN methods’ classes and the ML method groups according to the task they solved for a solar plant system
[2].
Figure 2.
Classification of tasks that are solved based on ML methods.
The ensemble’s types are bagging, boosting, and stacking/blending
[18,19][18][19].
Table 1 presents the comparison of ensemble techniques
[18]. There are constant and dynamic weighting ensemble approaches. In recent studies, the most used ensemble methods are RF, Extreme Gradient Boosting (XGBoost), Extreme Learning Machine (ELM), etc.
Model training methods that optimize performance include quasi-Newton, stochastic gradient descent (SGD), evolutionary computation, genetic programming, etc.
[15]. The creation of the ML model is the most complex and important task which includes the creation of an optimal ML model’s architecture and requires a multidimensional global optimization (GO).
The bias and variance estimate the effectiveness of a model. The improvement of a model’s bias always makes gains at the expense of variance and vice versa. The performance of ML models highly correlates with the representativeness of a dataset. A lot of techniques provide a model’s evaluation, including cross-validation, kfold, holdout with a different performance including accuracy (ACC), mean squared error (MSE), precision, receiver operating characteristics (ROC), recall, Matthew’s correlation coefficient (MCC), F1, area under the curve (AUC), mean absolute error (MAE), and root-MSE (RMSE). The relative errors, such as normalized RMSE (nRMSE), normalized MAE (nMAE), etc., facilitate the comparison between models that are tuned based on datasets with different scales.
With the goal to develop intelligent models for sizing, forecasting, and control of a solar plant system and to make an RNN more adaptive with regard to a task’s complexity and overfitting problem,
wresearche
rs developed an MFNN
[5,6,7,8][5][6][7][8]. The MFNN includes RNNs with fuzzy units and/or a convolutional block to process images. An RNN approximates a membership function in contrast to an Adaptive Network-Based Fuzzy Inference System (ANFIS).
WResearche
rs combined the modified multidimensional quantum-behaved particle swarm optimization (PSO) with the Levenberg–Marquardt algorithm (MD QPSO) and developed a hierarchical encoder of the particle’s dimension component
[5,6,7,8][5][6][7][8] to automatically create an optimal architecture of an MFNN and improve the convergence.
Table 1.
Comparison of ensemble techniques.
WResearche
rs implemented an MFNN and its life cycle, which includes automatic creation and self-adaptation as an intelligent framework based on the
authoresearchers’ software
[20]. This intelligent framework provides the automatic creation of the optimum architecture of an MFNN with regard to a task’s complexity.
All the above-mentioned ML methods and algorithms were implemented as software by an ML Framework, which represents a tool to create a smart sensor system.
2.3. ML Frameworks
ML frameworks implement many ML methods
[15].
Table 2 shows the comparison of popular ML frameworks.
Big data ecosystems, namely Apache Flink, Apache Spark, and Cloudera Oryx 2, include built-in ML libraries for large-scale DM. These ML libraries evolve presently, but the potency of the entire ecosystem is significant.
Google, Facebook, and Microsoft developed most of the DL frameworks that support ONNX, namely PyTorch, TensorFlow, Caffe2, Microsoft CNTK, and MXNet.
Chainer, Theano, Deeplearning4, and H2O are also appropriate DL libraries and frameworks for smart sensor system creation.
The high-level DL wrapper libraries such as Keras, TensorLayer, and Gluon are developed on top of the DL frameworks. They provide a simpler but more computationally expensive way for smart sensor system creation.
The ML frameworks provide an automatic MC phase of an ML model, including validating (
Figure 1). An ML sensor system can be implemented as software based on an ML framework that supports ONNX. Such implementation will provide flexibility and all an ML framework’s advantages for a developed ML sensor system.
Table 2.
Comparison of ML frameworks.
2.4. Open Resources for ML Research in a Solar Plant System
The open solar energy data sources, including big data, provide the development of cutting-edge ML technologies in solar energy.
The GitHub repositories
[21,22][21][22] are implementations of maximum power point tracking (MPPT) systems
[21] and management of cities’ demand/load
[22] based on an open-source Gym toolkit
[23]. An open-source tool pymgrid
[24] provides the creation and simulation of various microgrids. Octave
[25] and Scilab
[26] are open sources that are compatible with MATLAB.
Table 3 presents a brief description of the open datasets to implement and validate ML solar plant systems.
Table 3.
The open datasets to implement and validate ML solar plant systems.
Open Dataset |
Data Source Location |
Description |
Duke California Solar Array Dataset [27] |
- |
Over 400 km2 of imagery and 16,000 hand-labeled solar arrays |
SOLETE [28] |
City: Roskilde, Denmark. Latitude and longitude: 55.6867, 12.0985 |
Meteorological and active power 15 months dataset from PV array |
Desert Knowledge Australia Center Dataset [29] |
- |
Data of solar technologies spanning multiple types, ages, models, and configurations |
Girasol [30] |
Albuquerque, USA |
A meteorological (10 min sampling interval), insolation (a sampling rate ranging from 4 to 6 samples per second), and images (sampling interval of the cameras is 15 s) 242 days (of 3 years) dataset |
ESOLMET-IER Dataset [31] |
Institute of Renewable Energies UNAM, station “ESOLMET-IER” |
Solar metric and meteorological dataset |
The National Solar Radiation Data Base (NSRDB) [32] |
The USA and neighboring countries |
Solar insolation and meteorological 23 years dataset |
Photovoltaic Thermal Images Dataset [33] |
66 MW PV plant in Tomboruk |
Thermal images of PV arrays with the presence of one or more anomaly cells and their respective masks |
Pecan Street Dataset [34] |
- |
1300 customer loads one-year dataset |