Load prediction with higher accuracy and less computing power has become an important problem in the smart grids domain in general and especially in demand-side management (DSM), as it can serve to minimize global warming and better integrate renewable energies. Indeed, artificial neural networks (ANN) are the most used methods in forecasting electrical load. They are widely employed in this field for their numerous advantages. In fact, the complexity of this task is considerable due to several factors/parameters, such as weather and holidays (linear and non-linear relationships), which is a well-suited problem for ANNs and their capacity to deal with non-linear relationships.
In December 2015, 196 countries agreed on an international treaty for limiting global climate change by reducing global warming. The main goal is to limit global warming to well below 2 °C by limiting the use of fossil fuels 
. Due to both the availability and the cheap price of electricity in first world countries, and with the increase in the number of devices that need electricity to operate and the appearance of new ones such as electric cars, electrical grids and their growing effect on nature have become a significant concern.
Consequently, transition policies from fossils such as coal have been discussed in 
while suggesting an increase in the use of renewable energies. In fact, the research in 
predicts that the return of investment (ROI) for renewable energies is growing and will eventually become, in
the future, similar to the ROI of fossils. This means that the economic expense that prevents governments from using renewable energies at large scales will eventually disappear with time. Indeed, this article ii
s a part of the MAESHA project  
, which is funded by the European Union through the H2020 program. The goal of this project is the decarbonization of the future energy used in Mayotte and other islands by transforming the usual electrical grid into a smart grid that is able to manage the demand side to be adapted to the available generation at any time.
Moreover, different challenges arise when trying to manage the electrical grid of isolated areas because it is impossible to receive electricity from other countries or regions in the case of excessive demand, which can cause a blackout.
In addition, even though renewable energies are the main key to achieving the decarbonization of energy systems, their production varies significantly within short times due to environmental factors, such as the radiation of the sun or the speed of the wind, which can cause problems during peaks on the demand side.
Indeed, this brings the need for a smart grid that can detect, predict, and adapt to changes in order to match and manage the demand side with the availability on the supply side. As a consequence, predicting electrical load in advance is a very important challenge for demand-side management (DSM), especially in isolated areas 
Moreover, most of the current machine learning models are built to predict and are tested on the load demand of one region or country only, without taking into consideration the reusability of their models.
In fact, to predict the load of multiple islands with acceptable accuracy, it is important to build a flexible model in both space and time. First, as for the space, in order to predict the load in multiple regions without having to create different prediction models, it could be interesting to build a model combining standard machine learning models that are known to give good predictions with acceptable accuracy for the different situations or regions, thus having a reusable prediction model that can be used directly or with minimal changes on different regions, islands, countries, or even multiple buildings. The second challenge is how to simply build a flexible model that can predict in the range of multiple days or a week with an acceptable accuracy; at the same time, it should be able to predict a shorter range (the next 30 min or 24 h) with high accuracy.
2. Artificial Neural Networks for Energy
Since energy demand is a set of ordered values representing an evolution of a quantity over time, it is handled as a time series forecasting problem.
Predicting energy consumption is frequently done in the short term: almost 60% of the studies employing data-driven models make hourly predictions 
. It is equally interesting to see that the load in general is highly affected by cooling/heating (HVAC). Indeed, HVAC represents between 40% and 50% of the overall consumption of big buildings, such as offices, schools, or hotels 
. Forecasting this consumption is often easier since the HVAC is fairly continuous over time and depends on external parameters such as local weather or time of the year 
Moreover, since load prediction is a time series forecasting problem, wavelet transform (WT) is an important tool to help find the different time and frequency features of the load curve 
. Indeed, the use of WT in [12,13]
was proven to be an important preprocessing step for the prediction. Ref. 
has even proposed a framework of wavelet neural networks (WNN). Indeed, wavelet transform has been tried with different types of traditional and modern types of machine learning layers, such as feed-forward or convolutional layers 
On the other hand, it is not surprising to find familiar methods for time series, such as auto-regressive integrated moving average (ARIMA) [16
support vector machines (SVM) [19,20]
, and artificial neural networks (ANN), applied to this field, as shown in 
However, the review in 
shows that artificial neural networks are the most efficient prediction models for load forecasting in the smart grids domain in general and for demand-side management specifically.
Indeed, ANNs are the most used methods in forecasting electrical load. They are widely employed in this field for their numerous advantages. In fact, the complexity of this task is considerable due to several factors/parameters, such as weather and holidays (linear and non-linear relationships), which is a well-suited problem for ANNs and their capacity to deal with non-linear relationships. ANNs are extremely robust and flexible, especially the multilayer perceptron (MLP); they do not need to be programmed but require data to train on. They are easy to implement but require some specialized knowledge to configure. They can be used alone 
, but they can also be combined with other models to obtain a hybrid prediction [16,18]
. One of the disadvantages is the massive amount of data required to train the network. If there are not enough data to train the ANN, it will have difficulties generalizing and risk overfitting.
Another type of neural network widely used in the field of time series forecasting is the long short-term memory (LSTM). It is an artificial recurrent neural network architecture that can process entire sequences of data, making it a privileged model for handwriting recognition, speech recognition, and time-series data. It can be used alone [24,25]
or with a convolutional layer for better results 
. Some models combine neural networks with more classical models of time series forecasting.
They represent combinations of two or more machine learning techniques. These models are more robust, as they have the advantages of the individual techniques involved and improve the forecasting accuracy. By combining separate models, complex structures can be modelled more accurately. More and more papers use a hybrid approach thanks to their performance [27,28,29]
. They often combine linear with nonlinear models to be more robust and more accurate. The most traditional hybrid models are a combination of ARIMA for linear relationships and SVM or ANN to model the nonlinear component [17,18]
. However, various methods and algorithms have been used in the prediction models, such as empirical mode decomposition (EMD), the extended Kalman filter (EKF), characteristic load decomposition (CLD), and the radial basis function neural network (RBFNN). Table 1
provides a brief review of these rese
for load predictions covering short-term, mid-term, and long-term predictions and covering the various prediction algorithms
Although some of the previous research, such as 
, has been tested on multiple datasets from different countries, to our knowledge, nonnon
e of them have been built to be general enough to be applied to any case, nor are they easily reusable without modifications to the code (especially isolated areas such as Mayotte where blackouts happen regularly because of high load consumption).
In addition, all previous rese
have concentrated on having one prediction range (i.e., 1 h, 24 h, 1 week, etc.). However, having a prediction for 30 min can provide a higher accuracy than having a prediction for 1 week, and
as a consequence, having multiple prediction ranges at the same time (very short-term prediction with very high accuracy and short- or mid-term prediction with slightly lower accuracy) can provide more stable results for the smart grid systems that have to interact and make decisions for one or more days in