Short-Term Load Forecasting Method: Comparison
Please note this is a comparison between Version 1 by Filipe Martins Rodrigues and Version 2 by Jason Zhu.
Electricity consumption varies in short-, medium-, and long-term periods, which cover three forecast time horizons. They range, respectively, from one hour to a week, one week to a year, and more than a year. They are based on several factors, such as climate, region, and sector (residential, industrial, commercial). Residential loads consume most of the electricity generated. Short-Term Load Forecasting (STLF) is important for cost reduction, energy savings, fine scheduling, and safety assurance.
  • STLF
  • electricity
  • residential (household)
  • artificial intelligence

1. Introduction

Electrical short-term load forecasting usually covers the hourly forecast horizon up to one week [1][36]. This period is crucial in the decision-making of the electricity grid utility for the management of small to large scales electricity grids, where countries and groups of countries have common energy systems, such as the European Union [1][2][3][36,37,38].
In the analyzed literature, electricity demand forecasting has received some attention from many researchers in different countries due to its essential contribution to the planning and management of the electrical system [4][5][12,17]. This attention has been more oriented to the agglomerate of residences. However, in the last two years, researchers have been increasing their work on individual residences, a task that induces greater variability in the results.
Generally, the load profile in the residential sector follows cyclical and seasonal patterns related to the activities of residents and generates a time series of real consumption [6][39]. The dynamics of this system can vary significantly during the observation period (calendar effects), depending on the nature of the system and external influences (weather, occupancy, socioeconomic environment). It promotes variations in the results accuracy of the forecasting between different samples, even when using the same forecasting model. In the last decade, STLF solutions for the residential sector have used approaches based on different models, each characterized by different advantages and disadvantages in terms of forecasting accuracy, training complexity, sensitivity to parameters, and limitations in the prospect of expected forecasting [6][39].
In the analyzed literature, statistical and machine learning models are usually used in the forecasting of short-term load for residences, although the boundary between the two is becoming increasingly ambiguous because there is more multidisciplinary collaboration in the scientific community [7][40].

2. Synthesizing Residential Demand Forecasting

The load forecasting of a single household is less predictable than that of a cluster or residential agglomerate, as it has a more aggregated load profile [8][9][26,41]. Some of the forecasting studies presented found that the standard forecasting error decreased with the increase in the number of households in the cluster and agglomerate because the smoothing effect of the aggregation of the residential load profile arises [8][26]. Some of the studies presented proposed an approach to the classification of residences according to similar load profile patterns, suggesting that they eliminate part of the noise [8][26]. In the analyzed articles, historical load data (38 articles), calendar effects (26 articles), and weather data (22 articles) are the main sources of inputs for load forecasting models. Others (8 articles) employ simulations of the random use of various appliances to generate the demand profiles of a residence. In residential clusters, adding sociodemographic factors (3 articles) contributes to increasing the performance of the forecasting model.
In the analyzed literature, historical load and weather data are at the center of load forecasting models [8][26]. The use of all available data was a common approach or, if the dataset was incomplete, the longest periods with complete information were selected to build the forecasting model [8][26]. In the papers analyzed in this systematic review, the authors proposed forecasting models using datasets of at least 2 weeks and a maximum of 10 years.
The advantages of increasing the use of historical data are analyzed in the forecasting models developed. This analysis has the greatest importance when used in independent and dependent variable models [8][26]. Additional data that does not reinforce the accuracy of the forecasting model should be avoided because it can create noise, impair its performance, and require more computational time and power [8][26]. It turns out that using a larger historical dataset reduces the error in forecasting. However, it was shown that the approach of less historical data has a lower forecasting accuracy because the reduction in the amount of historical data reduces the quality of training of forecasting models. However, multi-year data can promote similar errors of accuracy, as families generally have dynamic habits and lifestyles.
Calendar effects promote variations in the load profile related to the days of the week and holiday days or calendar periods. The use of calendar effects in load forecasting identified weekly and seasonal energy consumption patterns, as well as enabled the forecasting of peak demand [8][26]. In the analyzed literature, 26 articles addressed the interaction between residential load and the effects of the calendar on the forecasting models presented. To recognize the similarities in load variation at different periods of the week, they used dummy variables, mostly binary, to characterize each hour of the day, day of the week, and weekend. Other authors have considered seasonal variability (winter–summer). Historical data were split into subsets to have the same day of the week or by certain hours of the day. The added calendar effects produced a smaller error in the accuracy of the forecasting models that used AI, but the statistical methods did not highlight significant differences between the proposed models.
However, to improve the performance of the proposed statistical models, the authors added other variables, such as sociodemographic factors [10][30], energy prices [11][42], occupancy behavior [10][11][30,42], and home appliances [10][11][30,42], which brought some improvements in agglomerate residences.
Most short-term load forecasting models are typically trained and tested at forecast intervals of 15, 30, or 60 min. The forecast is usually provided for daily forecasting for the next day, the next 24 h, hourly, or weekly [12][31]. For individual residences, the forecast provided is daily or weekly, while for cluster and agglomerate residential, the periods are usually monthly and seasonal. The choice of a 24-h horizon may be related to the operation of the next day’s electricity market [8][26]. For residents with self-production systems, the forecast for the next day is equally important in the decision-making process of the self-production operation [8][26].

3. Forecasting Models Comparative Analysis

3.1. Artificial Intelligence Method

The latest developments in the fields of data science and artificial intelligence have led to research on energy consumption forecasting using the historical data produced by consumption, behavior, and weather conditions. From this perspective, several AI approaches applied to STLF were developed, which can be divided into four groups based on the methodological nature of the algorithms: ML, ANN, DL, and hybrid [13][2]. A complete list of recognized articles that used AI in their forecasting methods is presented in Appendix A.
The most-used AI algorithms are ANN, SVR, and DL models [14][43]. ANN has very good results in nonlinear systems and is widely used in STLF solutions in load forecasting [14][43]. However, ANN suffers from a limitation of settling in the local minima and overfitting problems [14][43]. To avoid overfitting, the authors increased the amount of data, dropped out others, and trained with momentum [4][12]. Training a neural network consists of modifying its parameters through gradient descent optimization, which minimizes a given loss function that quantifies the network’s accuracy in performing the desired task [6][39], reducing the training error [15][44]. SVR is a statistical machine learning approach that has been successful in electricity load forecasting [14][43]. SVR can accurately obtain the best overall solution in the sample by the principle of structural risk minimization. However, SVR works well with a small sample of data but performs worse when using larger datasets [16][32]. DL allows modelling high-level abstractions and recognizing and extracting hidden invariant structures and intrinsic characteristics of the data [13][2]. However, this flexibility has a cost, namely, the DL architecture requires a significant amount of data to overcome other approaches, training is computationally intensive, and its interpretation is not easy [4][12]. It works well on certain types of STLF solutions, and it seems that arbitrarily increasing the depth of an ANN may not always produce the best results [4][12].

ML

The ML approach includes a set of methods that try to learn from historical data [17][10]. This group of AI techniques involves models that can automatically identify patterns in the data and then use them to predict and develop techniques that contribute to decision-making in an uncertain environment [17][10]. The application of ML models in STLF emerged to identify the correlation between input and output data and has been used to address the drawbacks presented by traditional techniques [13][2].

ANN

The ANN approach is widely used in STLF solutions due to its high accuracy in forecasting [18][22]. For this reason, it has gained great popularity to solve forecasting problems in STLF solutions [19][5]. ANN is an approach inspired by the behavior of human brain neurons. ANN is an information processing system inspired by how biological nervous systems are interconnected with each other [19][5]. ANN is a model that uses artificial neurons that are composed of layers within the network. The typical ANN has several input neurons and typically a single output neuron with several hidden layers and has a connection between them that is given a specific weight. The ANN model uses several supervised examples to learn from the input dataset to later be used to label new datasets with similar and never-before-seen characteristics [19][5].
The main feature of the neuronal network is its ability to learn, automatically, from the environment and adjusts its performance through learning [20][7]. ANNs can converge easily after training with the appropriate number of samples. ANNs can produce an output with a very small error (or almost that) with new input data without ever having trained them before. Another interesting fact is that ANN has a tolerance for noisy data [20][7].
The use of learning algorithms enables ANNs to approach any continuous function, to any desired precision, by creating internal ANN representations, avoiding the use of explicit mathematical models to illustrate the input–output relationships [20][7].

DL

DL-based models are a class of ML algorithms, and in recent years have been widely used in STLF [14][43]. DL systems are based on ANNs, and in time series modelling, the recurrent neural network (RNN) is the most widely used DL architecture for residence load forecasting [21][22][24,45]. RNNs fall under the unsupervised learning category [4][17][23][24][10,12,33,46]. In unsupervised learning methods, only the inputs are given. RNNs put the state values of the last neurons in the next neurons to perform time series data mining and have provided new approaches to STLF solutions. RNNs have demonstrated their ability to forecast medium and long-term electricity consumption at 1-h intervals of residences, and results have shown that they have relative errors lower than the common multilayer networks. However, in training, RNNs are prone to gradient problems, limiting their application in load forecasting systems [14][43].
Given this limitation and to better deal with long-term dependencies, LSTM networks were proposed in exchange for a higher computational cost [14][43]. To prevent the gradient from disappearing or exploding after multiple iterations of time steps, traditionally hidden nodes are replaced with memory modules [14][43]. CNN networks can identify nonlinear relationships between adjacent samples in local regions. CNN is a supervised learning algorithm and can be used when sufficient data can be labelled [17][10]. In supervised learning, the goal is to learn a mapping between the input vector and the outputs, if there is an existing labelled set of input–output pairs. CNN is skilled at assembling characteristics and extracting complex relationships using convolutional operators and nonlinear activation functions in the hidden layers [18][22].
LSTM and CNN are two different neural network architectures that can be combined to overcome some of the limitations of each approach. For example, LSTM networks are particularly good at handling long-term dependencies in sequences of data but can be computationally expensive. On the other hand, CNNs are better suited for identifying non-linear relationships in local regions of data but require labelled data for supervised learning [22][45].
By combining LSTM and CNN architectures, it is possible to take advantage of the strengths of each approach while mitigating their weaknesses. A CNN, for example, can be used as a feature extractor for picture data, with the retrieved features subsequently passed into an LSTM for sequence processing. To better handle long-term dependencies in the data, the LSTM can take advantage of the rich and sophisticated feature representations learned by CNN. The combination of the LSTM and CNN architectures enables more robust and adaptable machine-learning models that can handle a variety of tasks, including natural language processing and image recognition. The combination of LSTM and CNN architectures enables higher accuracy and generalization performance in a wide range of applications [22][45].

SVM

In machine learning, the SVM is a supervised learning method with associated learning algorithms that analyze data for classification and regression analysis [7][25][34,40]. SVM is one of the most robust forecasting methods, based on statistical learning structures [26][1].
The SVM analyzes the data and classifies it into one of two categories. The task of the SVM is to determine which category a new data point belongs to, making it a non-binary linear classifier type [27][3]. It is trained with a series of data already classified into two categories, building the model as it is initially trained and generating a map of the data classified. The SVM performs linear sorting and can efficiently perform a nonlinear classification using the technique called the kernel, implicitly mapping its inputs into high-dimension resource spaces [12][28][29][29,31,47].
The main benefits of the SVM approach are that it is noise robust and reduces overfitting and underfitting while meeting the overall minimums of the objective function. The SVM handles excessive and insufficient adjustment of the training samples, minimizing training error and regularization terms.

Hybrid Algorithms

The purpose of hybrid algorithms is to develop a forecasting model using optimization algorithms or pre-processing techniques to optimize the model parameters, leading to methodologies for attenuating or eliminating noise in the pre-processing of the electrical load and applying relevant characteristics in the time series by decomposing it into the original series. Most classic and original models have flaws, and the intended level of precision is not reached, so the researchers propose a hybrid model to take advantage of each model that composes it and smooth out the forecasting errors from the original forecasting model [30][48].
In recent years, optimization algorithms have been widely used in load forecasting research to improve the performance of forecasting models. While particle swarm optimization and differential evolution algorithms have been mentioned as popular algorithms for this purpose, this systematic review applied to STLF in households has identified that, in the papers reviewed, authors used other optimization algorithms used in the popular literature. Proposals for hybrid solutions that combine different techniques to optimize and improve forecasting results were identified. These approaches have been reported in the several reviewed papers and have shown promising results in load forecasting.

3.2. Statistical Methods

In recent decades, extensive research has been carried out on the development of models based on statistical methods to define the methodologies applied to the forecasting of electricity demand in STLF. The models used in the statistical methods correlate energy consumption or energy index with influence variables [31][32][49,50]. These models are developed from sufficient time series historical data and, like the ML models, are subsequently trained. Among these historical data, simplified variables such as weather data are used to correlate energy consumption to obtain an energy characteristic that stands out [31][49]. The selection of the models of the statistical method depends on the variation of the input data and the expected period. However, the statistical method is unable to deal with the dynamics system effectively due to nonlinearities induced by nonlinear data, such as calendar effect (weekday, weekend) or seasonality, which affect the electrical load profile [28][33][29,51].
The most used models for the analysis and forecasting of time series are the Autoregressive Integrated Moving Average (ARIMA) and SARIMA, which is an ARIMA where the letter “S” stands for seasonality [28][33][29,51]. The autoregressive model (AR) is a linear regression of the current value based on one or more previous values. Like the AR model, the Moving Average (MA) is a linear regression [34][19]. The difference lies in the regression of current values with noise or errors of one or more past values [34][19]. While SARIMA needs only the past value of a non-stationary time series, it is adaptable and can handle seasonality [34][19].
ARIMA is the most popular and mature among all other approaches due to the adaptability of linear patterns and its simplified algorithm. However, the residential electricity consumption profile is a non-stationary profile [33][51]. The ARIMA and SARIMA models use the lagged mean values of STLF time series data to convert non-stationary data to stationary data.
In the analyzed literature, other statistical models and their variation are also studied, such as adaptations to Bayesian inference, Gaussian processes, and wavelets. However, even with the adjustments introduced by the researchers, the statistical algorithms remain limited in identifying temporal variations and non-linear patterns of residential electrical load as required by STLF solutions and are inappropriate for an individual residence.

3.3. Time Series Analysis

Statistical models are the simplest and use time series trend analysis to forecast future energy needs. In the analyzed studies, models are proposed for STLF, which varies from one hour to one week, and the input variables used are times series of historical data of electricity consumption, weather, appliances, and socio-economic conditions. Medium-term forecasts are usually from one week to a year. However, depending on the forecasting granularity requested, in general, the models developed based on statistical methods for longer-term periods (>1 year) have uninteresting performances, consume a large amount of computation time, and require a good understanding of the underlying statistics.

3.4. Performance Analysis Metrics

For performance evaluation, the various articles apply multiple metrics of statistical evaluation methods for error assessment, namely MAPE, RSME, and MAE. However, several authors express concerns that traditional approaches to error assessment are not adequate in predicting household load due to a high error in values close to zero or scale problems due to differences in load profiles between households [35][27]. MAPE’s limitations, such as difficulties in handling small and zero denominators, are not very relevant to traditional load forecasting problems, because the aggregate load is rarely zero or is close to a very small number.
ScholarVision Creations