A good way to reach such challenging goals is to implement inside BASs or BDT ML models to improve building sustainability and reduce the related environmental impact. There are many such examples in the literature, wherein heating, ventilation, and air conditioning (HVAC); energy storage; renewable-energy production systems, and smart lighting interface with automatic control systems
[8][9][10][11][12]. Traditionally, these systems were managed using experience-based algorithms such as “proportional integrative and derivative” (PID) controllers, threshold actuators, or, in some cases, physical models. However, this approach can be difficult to implement due to the complex interactions between the different energy systems that coexist inside a building
[13]. However, owing to the recent diffusion of Internet of Things (IoT) devices, it is possible to obtain real-time data at low costs. Moreover, most of these devices are widely used, making them a reliable and simple solution for monitoring the environment
[14][15][16]. Then, the data collected can be easily sent to big data platforms to develop innovative control strategies and historical analysis. ML models can be implemented directly in IoT devices, into an edge or fog node (a system able to collect and process data coming from different IoT devices and placed near them or somewhere inside the infrastructure), or in a remote/cloud server
[17]. This system allows the models to learn directly from the environment, to consider the actual and ever-changing needs of users, and to adapt to changes almost in real-time. This approach is called “data-driven” and is considered a paradigm of digitalization
[18][19]. With this approach, buildings can be monitored and can be enabled to make decisions autonomously, with or without human intervention
[20]. By integrating real-time data with geometrical information from the project phase (as from building information modeling, or BIM), it is possible to develop digital twins (DTs), a digital construct that can replicate and simulate the behavior of a building in real time. It is important to emphasize that to develop a proper DT, the system must be able to “interact” with the environment. There is also the possibility of employing a human-in-the-loop approach
[21][22]. Therefore, ML models are adopted by operators as a part of the control process
[23]. In the future, ML will likely be used in almost all the processes related to building management and energy systems and will be considered to have the same importance as other components, such as HVAC and electrical systems.
What follows is a brief description of some of the most relevant use cases for the application of ML models in building energy management.
3. Models and Techniques for Energy Assessment and Optimization for Built Environment
The variety of ML models is quite wide, with many variants having developed over time. However, they can be organized based on the task they perform, the type of learning they use, and the related input data
[99]. The classification scheme is shown in
Figure 7.
Figure 7. Classification of ML models based on learning type and tasks.
It is not an objective of this paper to describe the learning methods of ML models. Therefore, for further detail, the reader can consult the literature specific to that subject
[99][100][101][102]. However, generally, it is possible to describe data types as follows:
-
Structured: input data must be well-defined and structured, with information organized and described in detail. For instance, device names, times, power, temperatures, locations, occupancy, etc. are examples of structured data.
-
Unstructured: if the data has no pre-defined format or organization, it is considered unstructured. In these cases, the analysis of relevant information is much more difficult to perform. For example, textual input, word processing, audio files, videos, images, etc. can be considered unstructured data.
-
Semi-structured: data are not stored in an organized structure (such as a relational database) but have some organizational properties. For example, XML, JSON documents, NoSQL databases, etc., are examples of semi-structured data.
Moreover, input data can also be associated with metadata information, which can be considered structured, unstructured, or semi-structured. Before choosing an ML model, it is important to understand the structure of the input. For interpretability, the optimal data structure is related to the user’s knowledge of the process. Usually, the choice of structured or semi-structured data is advisable, especially when using ante hoc models that are intrinsically interpretable. If data are well structured, it is possible to easily obtain a higher level of interpretability (level 2–3) because the process is repeatable with few accuracy issues and because little knowledge of the specific process under analysis is required
[101][102][103]. However, this approach elevates the importance of the other data structures (semi-structured and unstructured) having good interpretability. Below, the most-used interpretable ML models for energy management in built environments are discussed with examples based on the scientific literature on the matter.
4.1. Interpretable Artificial Neural Networks
ANNs are widely used across the scientific and industrial communities because they are quite easy to use and adapt to different contexts. They were developed to mimic the neural structure of the brain, beginning with the first studies on perception
[103]. Usually, they are considered “black box” models because the internal operations performed on the data are usually obscure and difficult for the user to understand
[104]. However, in recent years, due to their wide distribution and the widespread interest in them, new and interpretable modified ANN models have emerged. For instance, Jang et al.
[105] developed an accurate DNN for daily peak-load forecasting (DPLF) using two buildings as case studies. For interpretability, they applied a hyperparameter optimization and a SHAP model reaching up to 12.74% prediction accuracy, which shows how the previous day’s peak load and temperatures influence the predictions. Li et al.
[106] investigated the effectiveness of using an attention mechanism to increase the RNN’s interpretability for 24-hour building cooling-load prediction. As a result, they show that past energy consumption has the greatest influence on the prediction due to the building’s thermal inertia. Cheng et al.
[107] developed a physics-informed neural network that takes data from a resistance–capacitance (2R2C) thermal model. The use of a physics model gives the user knowledge of the effect of changing input parameters thanks to the intrinsic interpretability of physical law (which can be considered an ante hoc model) and the flexibility of ANN in processing data. Following the same approach, Chen and Zhang
[108] proposed a modified LSTM network using thermal dynamics parameters for modeling building thermal performance. Moreover, Di Natale et al.
[109] proposed a physically based ANN that was able to integrate environmental knowledge for building thermal modeling. Wang et al.
[110] proposed a Direct eXplainable Neural Network (DXNN) to calculate solar irradiance. Modifying the activation function of the neural network they were able to obtain a direct relationship between the model’s input and output. Moving to occupancy and activity detection, Cengiz et at.
[111] developed CNN to track human activities. The model interpretability was increased using the Activity Recognition on Signal Images (HARSI) approach, transforming accelerometer data into signals in the domain of frequency. Yuan et al.
[112] proposed a passive multi-modal interpretable data fusion algorithm based on RNN for Human Identification and Activity Recognition (HIAR). The data that comes from Software Defined Radio (SDR) and Passive InfraRed (PIR) sensors was interpreted using the SHAP model to evaluate results. Zhang et al.
[113] developed a networks-based Takagi-Sugeno-Kang fuzzy classifier (DBN-TSK-FC) to analyze indoor occupancy. The fuzzy classifiers give interpretability to their unsupervised Deep Belief Network (DBN) representing the first layer of neural network hidden nodes as a set of consequent variables of fuzzy rules. E. Kim
[114] proposed an Interpretable Convolutional Neural Network (I-CNN) for indoor activity detection by adding temporal convolution and pooling layers into a CNN. Li et al.
[115] showed how the day hour was the most influential feature for hourly electricity prediction using an Automatic Relevance Determination (ARD) network, a modified ANN model to reveal the relationship between input features and model output.
4.2. Encoder–Decoder
Encoder models take an input sequence and create a contextual representation (which is also called context) of it, instead, decoder models take this contextual representation as input and generate an output sequence. Therefore, encoder–decoder models use textual or categorial input to generate categorical or textual output. A very famous example of such a model is Google Translator which generates text decoding a textual input
[116]. Recently, the attention mechanism has been included inside Encoder–Decoder models to increase machine-to-machine translation performances and the human interpretability of the output
[117]. The attention mechanism is intended for mimicking cognitive attention using “soft” weights for each parameter under investigation that can change during each runtime, in contrast to “hard” weights that are fixed, pre-trained, and fine-tuned. Encoder-decoder models can also be used with time series
[118]. This makes the attention mechanism a useful tool by which to increase the interpretability of encoder–decoder models for both regression and classification tasks. Luo et al.
[119] used an encode-decode architecture and ConvLSTM attention mechanism for the prediction of multi-energy loads in a micro-energy network showing good accuracy in forecasting energy loads. Li et al.
[120] developed an FDD model using time-series data coming from chillers for the forecasting of cooling loads inside a building and proving the possibility of obtaining the importance of each input feature, thus explaining the impact of each feature. Gao and Ruan
[121] proposed an accurate building energy consumption prediction model investigating three different interpretable encoder and decoder models based on LSTM and the self-attention mechanism. Their model showed the impact of daily temperatures (max, min, and mean) and dew point temperature on the predictions. A similar approach was used by Ellis and Chinde
[122] for the predictions of HVAC loads in a closed-loop system for a single environment belonging to a multi-zone building. The results were validated using EnergyPlus software and can predict the indoor air temperature and HVAC sensible cooling rate. At least, Azam and Younis
[123] developed an encoder–decoder model for the prediction of energy consumption demonstrating the importance of historical feature inputs on model output.
4.3. Clustering and Feature Extraction
Clustering models are intended for grouping input data into clusters that share homogenous proprieties, instead, feature extraction is intended for analyzing raw data to extract numerical features that conserve the information of the original dataset. For example, when measuring the efficiency of an HVAC system with a clustering model, it would be theoretically possible to organize them by the thermodynamical cycle; instead, using a feature-extraction model, it is possible to analyze the efficiency of a thermodynamical cycle. Unlike ANNs and encoder–decoder techniques, clustering and feature extraction can be used in addition to any “black box” model without the need to modify processing. Liu et al.
[124] proposed a clustering data-mining-based framework to extract typical electricity load patterns (TELPs) for individual buildings. To shrink the dataset, they clustered the shapes of the electricity-load profiles into five statistical features using a density-based spatial clustering model (DBSCAN) and a K-means algorithm. To increase the interpretability of the approach, the authors also implemented a CART algorithm, showing how it is possible to perform an early FDD analysis based on anomalous load profiles. Prabhakaran et al.
[125] proposed a small binary decision tree to increase the interpretability of a K-means model for indoor occupancy estimation. Galli et al.
[126] proposed a multi-step methodology based on a clustering algorithm and a LIME to investigate the energy performance classes of buildings using a large set of energy performance certificates (EPCs) as input. Choi and Kim
[127] investigated the performances of clustering and tree-based algorithms to evaluate energy-efficiency buildings (EBB). they found that the conditional inference tree (CIT) algorithm performed best in terms of interpretability and classification accuracy compared to other decision-tree and clustering models. Tang et al.
[128] used smart meters along with demographic and socio-economic data to identify the main drivers of residential energy consumption, showing a correlation between age education level, and load patterns. Moreover, they obtained better classification performance using feature-extraction algorithms, rather than XGBoost and ANN models. The good performance of simple models in clustering and feature extraction has also been evidenced in the work of Grimaldo and Novak
[129]. They obtained better accuracy using k-nearest neighbors (kNN) algorithms instead of the more complex random forest (RF) and gradient-boosted trees (GBT) models for the forecasting of energy usage in buildings. Moreover, their approach is also interpretable because they extracted the relevant information, organized it, and reduced the data complexity
[130]. The results are also published in another work, wherein they developed a smart energy dashboard using kNN and decision tree models to visualize daily energy consumption to increase user awareness
[131]. An innovative clustering technique that is intended for simulating buildings’ thermal design data was developed by Bhatia et al.
[132]. Their approach is called axis-aligned hyper-rectangles and can cluster information, dividing data into hyper-rectangle boundaries interpretable using specific rules. The authors created rules for the calculation of the window-to-wall ratio to assist in the design process of building envelopes in different climate zones. Kasuya
[133] proposed a Gaussian mixture (GM) model and a distribution-based clustering (DBC) algorithm for the prediction of loads for the next day, using energy data as input. Miller and Xiao
[134][135] showed that a clustering model and energy-consumption data can be used to classify living spaces by their intended use, making results interpretable.
4.4. Generalized Additive Models
GAMs were born as an improvement on linear regression and logistic regression models. They are more generalized than those models but maintain a good degree of interpretability because GAMs are ante-hoc models
[136]. Moreover, GAMs can be used on discontinuous and volatile data thanks to the use of a smoothing function and back fitting
[137]. Khamma et al.
[137] used GAMs to forecast indoor heat and power usage as input for the ambient air temperature, solar radiation, and time. As a result, the outdoor air temperature showed a negative relationship with the predicted heating load, while solar radiation had a negative exponential relationship with the predicted heating load. Voss et al.
[138] developed a mathematical model for the calculation of building inertia thermal energy storage (BITES) for smart grid control. They used a GAM to obtain BITES input parameters from building data, demonstrating that the ceiling surface temperature can be used as a proxy for the current state of energy use. As a result, they calculated the benefit of using conventional hot water tanks instead of batteries for energy storage and the related potential to reduce the building’s carbon footprint. Li et al.
[139] used MLR, GAM, and energy performance index (EPI) to calculate the energy efficiency of a healthcare facility. The authors found that MLR was the most consistent and robust benchmarking model, while GAM appeared to have the best accuracy. Ghose et al.
[140] implemented a regression analysis using Kruskal-Wallis (KW) and GAM to support the interpretation of = life-cycle assessment (LCA) data for the refurbishment of office buildings in New Zealand under four different scenarios: business-as-usual, integration of PV panels, integration with an electric renewable-energy grid, and implementation of best construction practices. González-Mahecha et al.
[141] proposed a model to evaluate the impact of renewable energy technologies for zero- or nearly-zero-energy buildings (ZEB and NZEB). GAMS was used to calculate, for every system and on an hourly basis, the demand and production of power, using as a case study a real building in Portugal. Their model considers the use of solar PV panels and miniature wind turbines for energy production and batteries for storage. Their studies also investigated the energy costs related to selling and buying electricity, making the results interpretable in economic terms. In addition, GAMs were also used to perform sensitivity analysis on input features for thermal-comfort modeling
[142] and thermal energy-storage modeling
[138] and to identify operational patterns of gas-powered HVAC systems
[143], distributed PV PP
[144], and short-term energy prediction in buildings
[137]. The main drawback of GAMs is their simplicity; they cannot be compared to more complex models but can only approximate the real behavior of the system analyzed.
4.5. Local Interpretable Model-Agnostic Explanations
In 2016, Ribeiro et al.
[145] introduced LIME as a model-agnostic method designed to provide localized interpretations for individual predictions using a local surrogate model. LIME is particularly valuable for explaining classification problems, as it can provide both contradictory and supportive information for each input feature about a given prediction. Hülsmann et al.
[146] used LIME in combination with an ANN to develop an optimization model for a small energy grid of buildings using PV panels and batteries. They showed the advantages in terms of the interpretability, compactness, and robustness of using LIME instead of sensitivity models for the evaluation of energy systems. Wastensteiner et al.
[147] applied LIME to interpret ML-based time-series classification models for the assessment of buildings’ energy consumption and then analyzed the stability and reliability of the interpretations. Tsoka et al.
[148] developed an eXplainable Neural Network (XNN) to evaluate Italian EPCs. Their approach involves the use of an ANN with a LIME model to make the results interpretable, showing that transmittance and the dimensions of opaque surfaces are the main contributors to the EPC value. Chung and Liu
[149] used a standardized regression coefficient (SRC) along with LIME and SHAP to evaluate the main parameters for the prediction of energy loads in office buildings on different climate zones via a DNN. Among the models analyzed, SHAP needed the smallest number of inputs for an accurate load prediction. Another interesting result observed by the authors is the possibility of removing weather sensors and using climate time-series data instead. Srinivasan et al.
[150] used LIME for chiller FDD, inspecting issues such as scaling in condenser fins, sensor errors caused by flow pulsations, and false alarms. They showed that the LIME’s ability to provide contradicting information plays a dual role: it assists decision-makers in identifying faults and can identify false alarms generated by “black box” models. Carlsson et al.
[151] applied LIME to a regression problem, evaluating the confidence level of individual predictions related to chiller COPs.
4.6. SHapley Additive exPlanations
The SHAP algorithm was originally proposed by Lundberg and Lee
[152] in 2017. It was designed as an interpretability tool to explain individual predictions (local interpretations) or for global interpretation, aggregating the model’s values. Moreover, the authors also proposed KernelSHAP, an alternative, kernel-based estimation approach inspired by local surrogate models, and TreeSHAP, a tree-based estimation model. This model calculates the average marginal contribution of a feature value across all possible coalition values, an approach borrowed from coalitional game theory. In this way, every feature value contributes to the prediction. SHAP has attracted considerable interest from the scientific community, and, like LIME, it has been widely adopted. For instance, Dinmohammadi et al.
[153] proposed a model for the prediction of energy consumption in residential buildings under different indoor and outdoor conditions. The authors used a PSO-optimized RF classification algorithm for the identification of the most important factors related to heating demand, a self-organizing map (SOM) for feature aggregation and dimensionality reduction, and an ensemble classification model for forecasting. SHAP was used in addition to a causal inference method to increase model interpretability, revealing a relationship between water-pipe temperature changes, air temperature, and the building’s energy consumption. Sun et al.
[154] used SHAP in combination with a DNN for the calculation of the building’s energy efficiency, using EPC data and façade information from Google Street View (GSV) as input. As a result, they developed an automated tool for the identification of the contribution of each building feature to energy efficiency. In a work by S. Park et al.
[155], SHAP was used to provide both local and global interpretations of an RF model for FDD in district heating systems. Moon et al.
[156] developed an eXplainable Electrical Load Forecasting (XELF) methodology using tree-based ensemble models (RF, GBM, XGBoost, LightGBM, and categorical boosting). SHAP was used by the authors to investigate the main contributors to energy loads, showing that the temperature-humidity index and wind-chill index are the most influential factors. Zhang et al.
[157] adopted SHAP to provide local interpretations for evaluating thermal comfort and calculating predicted mean vote (PMV) values. The authors also proposed potential solutions to enhance indoor thermal comfort based on SHAP interpretation. A similar work by Yang et al.
[158] used SHAP to evaluate three thermal-sensation models: hot, neutral, and cold. Their results showed that air temperature and relative humidity are the most influential features across all models. Shen and Pan
[159] developed an automatic tool for energy-performance assessment in buildings, supported by BIM data. Their tool comprises three components: DesignBuilder for simulation, a BO-LGBM (Bayesian optimization-LightGBM) and SHAP model for the prediction and explanation of performance, and an AGE-MOEA algorithm for the optimization of buildings. SHAP results revealed that HAVC systems have the greatest impact on energy consumption. Chang et al.
[160] utilized SHAP to analyze and reveal feature importance in PV power generation models. Their results revealed global horizontal irradiance as the most influential feature, a result aligned with the results of Pearson correlation (PC) analyses. Arjunan et al.
[161] proposed integrating the EnergyStar++ assessment with SHAP models to explain why a building is achieving a particular score. They used multiple linear regression (MLR) with feature interactions (MLRi) and GBT to calculate the scores and SHAP to evaluate the contributions, showing how the features under analysis (number of workers, number of computers, gross flow area, working hours/week, cooling degree days, and cooled gross flow area) are related to the result. Gao et al.
[162] used SHAP to interpret RF and light gradient boosting machine (LightGBM) models for chiller FDD. The SHAP model can be also used for occupancy-related tasks, such as to predict CO
2 concentration
[163]. Jang et al.
[105] proposed a methodology for DPLF based on a robust and interpretable DPLF (RAID) model. Their goal was the development of a low-resource-intensive model that be used on systems without GPU hardware. SHAP was used to make an MLPRegressor and an Optuna optimizer interpretable, revealing the influence of the previous day’s peak load and temperature-related variables on the predictions. Park and Park
[164] used SHAP to rank feature importance in the prediction of natural ventilation rates, finding that the most influential features are pressure differences, outdoor temperature, and wind speed. Wenninger et al.
[165] performed an analysis of EPC in England and Wales based on retrofitting interventions, house prices, and socio-demographic information. The authors used SHAP to identify the key factors and relationships between the features, showing that many interventions were related to CO
2 emissions and suggesting easy-to-implement policy measures. Papadopoulos and Kontokosta
[166] used SHAP to interpret the results of an XGBoost-based energy benchmark for residential buildings. Akhlaghi et al.
[167] used SHAP to interpret performance-related indices (cooling capacity, coefficient of performance (COP), and wet/dew point efficiency) for a dew-point cooler, showing the relationship between cooling capacity and intake air velocity.
4.7. Other Techniques
Other interpretable ML techniques can be used for energy assessment. For instance, permutation importance (PI) can assess feature importance by shuffling feature values and observing the impact on model predictions. A feature is considered important if its shuffling leads to a substantial prediction error. Carlson et al.
[151][168] used PI to increase the interpretability of an ANN-based electricity-load prediction. Chahbi et al.
[169] used both RF and PI models for the evaluation of building energy consumption. RF was used for the predictions, and PI was used to make the outcomes interpretable. Singh et al.
[170] proposed a component-based ML model (CBML) for building energy prediction. Their approach is based on decomposing the building into small components and calculating intermediate parameters from heat flows and energy components for each zone, using the aggregate of these values to calculate the total energy demand. C. Zhang et al.
[171] proposed a hybrid prediction method based on LSTM networks and ANN to forecast the energy loads of buildings. To interpret the results, they used a dimensionless sensitivity index (DSI) and a weighted Manhattan distance to quantify feature importance. Alfalah et al.
[172] evaluated the number of occupants in a school building, as well as the occupancy patterns and profiles. They used a hidden Markov model (HMM) for the predictions and Kullback Leibler (KL) for the interpretation of results. A KL model was also used by Kim and Cho
[173] to measure feature relevance in energy prediction, using latent states from an encoder–decoder model.
Feature importance can be also assessed using tree-based methods by calculating each feature’s contribution to reducing error within the tree model. These methods include RF, gradient boosting machine (GBM), XGBoost, and Cubist
[174][175]. Wang et al.
[176] used regression tree (RT) and support vector regression (SVR) for hourly energy-use prediction in two educational buildings in north central Florida. RF was used to identify the most influential features during both semesters, showing how they vary by semester and indicating the existence of different operational conditions for the tested buildings. Smarra et al.
[177] developed a data-driven model predictive control (DPC) technique for building energy optimization and climate control using FR and RF models. Liu et al.
[178] developed a mixed model for the forecasting of building energy consumption based on the design of the building envelope. RF was used to predict building energy consumption and rank the importance of each parameter, then a Pearson function was used to evaluate the corresponding correlations. Their results revealed that the most influential parameters were the heat-transfer coefficients of the exterior walls and outer windows and the window-wall ratio. Liu et al.
[179] developed a rule-set surrogate model to replace an RF model for building energy prediction. Touzani et al.
[180] predicted the energy consumption of buildings using a GBM model and found that the model performed better than RF. Zhang et al.
[181] combined the effect of building characteristics, building geometry, and urban morphology for the evaluation of energy consumption and carbon footprint. They used a light GBM integrated with a SHAP model to provide insights regarding the results. Yigit
[182] developed an integrated thermal-design-optimization model for an existing building consisting of a genetic algorithm (GA) and GBM. Interpretability arises from the evaluation of energy-saving measures used as input parameters for the model, such as WWR, insulation thickness, and the orientation of the building. Moon et al.
[175] used a Cubist regression to rank the importance of features for energy forecasting, revealing that external factors such as outdoor air temperature and the dates of holidays, along with internal factors like one-day-ahead and one-week-ahead energy loads, play fundamental roles. Sauer et al.
[183] used an adaptively tuned XGBoost algorithm to predict the cooling and heating system loads of residential buildings. Mohammadiziazi and Bilec
[184] used four different models (RF, XGBoost, single RT, and multiple linear regression), to predict energy-use intensity (EUI) across the USA during the 21st century. They showed that outcomes are related to climate data and that it is therefore crucial to use a comprehensive building dataset to assess energy consumption. Huang et al.
[185] compared the performances of LSTM, SVR, and extreme gradient boosting (XGBoost) Networks for the forecasting of energy consumption in public buildings. The result of their study identified the most suitable model for use largely on the natural characteristics of building energy data. Sipple
[186] proposed an unsupervised anomaly-detection method for ANNs to identify power-meter device failures in office buildings, employing an integrated gradients approach to interpret anomalies. Zhang et al.
[187] used inspected FDD based on building energy consumption anomalies. Anomaly detection is considered a one-sided process. Therefore, to extract the correlation between features and their influence on the output, the authors introduced a graph convolutional network (GCN) enhanced by a graph attention mechanism. Lei et al.
[188] took the data from the building energy monitoring system (BEMS) of a university building to develop an anomaly-detection analysis using a clustering algorithm and particle swarm optimization (PSO) to improve detection and support the adjustment of building-management strategies. The counterfactual explanation is yet another system for generating local interpretations of individual samples. This method creates nearby samples with minimal feature changes that alter the model’s output. Sakkas et al.
[189] selected features through statistical analysis and then utilized them for a diverse counterfactual explanation (DiCE) model to conduct counterfactual analysis to interpret energy-demand forecasting. Tran et al.
[190] developed an innovative context-aware evolutionary learning algorithm (CELA) to both increase the capabilities of existing evolutionary learning methods in handling many features and datasets and to provide an interpretable model based on the automatically extracted contexts. The authors tested their algorithm on real-world energy-prediction tasks for buildings, with performances comparable to those of XGBoost and ANN.
5. Discussion and Conclusions
ML techniques are currently used in many different technological fields. Moreover, their widespread use is expected to rise in the coming years thanks to the widespread use of smart devices, the increase in computational power, and the ever-increasing presence of technology in everyday life. Using ML algorithms, it is possible to reduce resource consumption in the built environment. However, users and stakeholders may not trust the output of an algorithm due to the intrinsic difficulty of interpreting what it means. A search for published papers in the Scopus database that contain the keywords “Energy”, “Building” and “Machine Learning” revealed that there are more than 1700 records available. However, if the keywords “interpretable” (or “interpretability”) or “explainable” (or “explainability”) are added to the search, the number of papers decreases to 140, with the first work having been published in 2014
[191]. The numbers rose in 2019, with 12 papers/year. As a reference, in 2023, a total of 53 such records were published in the database. This overview cannot be considered a deep investigation but is intended to show the rising interest of the international scientific community in this topic. For instance, this work reviewed more than 200 papers, all intended to show the usefulness of ML for energy management in a building environment, 98 of which specifically focused on interpretability issues. Moreover, based on the study of the current literature, the main applications are FDD, load and power management, and occupancy-activity prediction. This work also describes the most-used models for prediction/classification and the assessment of interpretability. A summary based on the reviewed literature of the main advantages and disadvantages of the different interpretable ML models, as well as the applicable reference models, is reported in
Table 1.
Table 1. The main advantages, disadvantages, and applicability of ML interpretable models are based on the literature analysis.
Model
|
Advantages/disadvantages
|
Applicability
|
Interpretable Artificial Neural Networks
|
Advantages:
· High accuracy and performance.
· Can use text, images, and tabular data as input.
· Model agnostic.
Disadvantages:
· Prone to overfitting.
· Mainly for local interpretations.
· Custom development.
· Post-hoc.
· Long training time.
· Sensible to badly structured data.
|
ANN, CNN, DNN, RNN
|
Encoder-Decoder
|
Advantages:
· Capturing Contextual Information
· Suitable for Natural Language processing
· Long Short-Term Memory.
· It is possible to use pre-trained models.
· Model agnostic.
Disadvantages:
· Mainly for local interpretations.
· Similar performances if compared to ANN.
· Difficulty to cope with long input.
· Long and complex training.
· Model specific.
· Custom development.
· Post-hoc.
· Sensible to badly structured data.
|
RNN
|
Clustering and feature extraction
|
Advantages:
· Ante-hoc.
· Easily interpretable.
· Low preparation time.
· Resistant to badly structured data.
· Non-linear and non-parametric.
· Can be used with different features (categories and numeric data).
· Both for local and global interpretations.
Disadvantages:
· Instable on different datasets.
· Prone to overfitting.
· Limited performances for regression. Tasks
· Model specific.
· Non-continuous.
· Long training time on a large dataset.
|
K-means, DBSCAN, GBT, CIT, XGBoost, kNN, DBC, GM, DT, RF
|
Regressors |
Advantages:
· Ante-hoc.
· Easily interpretable.
· Low training time.
· Easily interpretable.
· Low preparation time.
· Both for local and global interpretations.
Disadvantages:
· Unstable on badly structured data.
· Parametric.
· Model specific.
· Suitable only on numeric datasets.
· Low accuracy if compared to other models.
· Difficulty in addressing complex problems.
· Selective and possibly contrastive results.
|
Linear and logistic regression, SVR, MLR, Lasso
|
Generalized Additive Models (GAM)
|
Advantages:
· Combines the advantages of linear and logistic regressions.
· Easily interpretable.
· Applicable to all regression tasks.
· Flexible and regularizable.
· Good performances and training time.
· Resistant to badly structured data.
· Model agnostic.
· Both for local and global interpretations.
· Ante-hoc.
Disadvantages:
· Parametric.
· Rely on assumptions about the data-generating process.
· Less interpretable if compared to linear and logistic regressions.
· Suitable only on numeric datasets.
|
|
Local Interpretable Model-Agnostic Explanations (LIME)
|
Advantages:
· Surrogate model.
· Works on different datasets (text, images, and tabular data).
· Applicable to all ML models.
· Fidelity measure.
· Human-friendly explanations.
· Possibility to use different features on the training model.
· Model agnostic.
· Both for local and global interpretations.
Disadvantages:
· Post-hoc.
· Difficulties in defining a good kernel.
· Data sampled by Gaussian distribution only.
· The complexity of the interpretation must be defined in advance.
· Instability of explanations.
|
All models.
|
Shapley Additive exPlanations (SHAP)
|
Advantages:
· Surrogate model.
· Give insight into contrastive explanations.
· Based on LIME
· Fast to implement.
· Applicable to all ML models.
· Fidelity measure.
· Human-friendly explanations.
· Model agnostic.
· Both for local and global interpretations.
Disadvantages:
· Post-hoc.
· Slow training.
· Ignore feature dependencies.
· Can produce unintuitive results.
· Can be misinterpreted or give bad interpretations.
|
All models.
|
Other techniques
|
Advantages:
· Both for local and global interpretations.
· High accuracy.
· Partially post-hoc.
· Partially model-specific.
· Works on different datasets (text, images, and tabular data).
Disadvantages:
· Interpretability depends on the specific implementation.
· Custom implementation.
· Complexity
· Partially model-specific.
· Partially post-hoc.
· Instabilities and unexpected results.
|
GA, PI, DiCE, and combinations of other ML models.
|
However, there are still some limitations for interpretability. As first. there is still a difficulty in explaining what is intended with interpretability. Secondly, many works lack a comparison between different interpretable models and don’t consider a common definition framework. Therefore, it is difficult to evaluate the real capability of the developed models to give valuable interpretations. Moreover, according to Krishnan [212] there are two more challenges associated with interpretability. The first one is related to the general lack of knowledge about what can be provided: it could be difficult to understand if a provided interpretation is formally valid considering that interpretable models are used to address the lack of such interpretation. The second challenge is about the user’s responsibility: interpretations are oriented to give a justification or non-discrimination for a problem. If a user bases his evaluation only on the provided interpretation the real causes can be misunderstood, reducing, or limiting attention to other parameters. Moreover, a wrong use of an interpretable model can artificially reduce the solution space, identifying a possible solution as the problem itself. The introduction of a common framework is advisable for the assessment of the different approaches. Moreover, the AEC sector has only recently started to adopt such technology inside its processes that was originally developed for other industries. Filling the gap still requires time, knowledge, and expertise but this would be one of the challenges for a more sustainable future development.