Model-Free HVAC Control in Buildings

Model-Free HVAC Control in Buildings: Comparison

Please note this is a comparison between Version 2 by Sirius Huang and Version 1 by Panagiotis Michailidis.

The efficient control of Heating, Ventilation, and Air-Conditioning (HVAC) devices in building structures is mandatory for achieving energy savings and comfort. To balance these objectives efficiently, it is essential to incorporate adequate advanced control strategies to adapt to varying environmental conditions and occupant preferences. Model-free control approaches for building HVAC systems have gained significant interest due to their flexibility and ability to adapt to complex, dynamic systems without relying on explicit mathematical models. This work illustrates a comprehensive review of the most impactful research applications of the last years concerning RL, ANNs, FLC, and Hybrid Model-free control applications for HVACs in building structures. Through a holistic evaluation of multiple research efforts, the primary aim is to identify different trends in Model-free control towards HVACs and moreover, to provide Future directions in the field.

model-free control
optimal control
HVAC
reinforcement learning
artificial neural networks
fuzzy logic control

1. General Description of HVAC Systems

HVAC Operations and Types

HVAC systems operate on the fundamental principles of thermodynamics and heat transfer. By leveraging processes like conduction, convection, and radiation, these systems either add or remove heat from indoor spaces. Ventilation components ensure proper air circulation, maintaining optimal air quality. Collectively, HVAC systems create a balanced indoor climate, ensuring both comfort and health for occupants [5,32]^[1][2].

More specifically, HVAC systems utilize a medium (like a refrigerant, water, or air) to transport heat and fans or pumps to facilitate the flow of this medium and air. HVAC systems include the following operations:

Cooling Operation: The cooling operation of an HVAC system starts with the compressor, where the refrigerant is pressurized and heated, converting it into a high-pressure, high-temperature gas. This gas then flows through the condenser coils, typically located outside the building. As outdoor air is blown over these coils by a fan, the heat from the refrigerant dissipates into the environment, causing the refrigerant to condense into a high-pressure liquid. This liquid then passes through the expansion valve, where its pressure drops suddenly, leading to a significant decrease in temperature.
Heating Operation: The cold refrigerant flows into the evaporator coil situated inside the building. As indoor air is circulated over these coils by another fan, the refrigerant absorbs the heat from the air, thereby cooling it. The refrigerant, now warmed, returns to the compressor, and the cycle repeats. On the other hand, the heating operation essentially reverses this process. The system extracts heat from the outdoor air even when it is cold, amplifies it using the compressor, and then transfers this heat indoors through the evaporator coil, thereby warming the interior space.

Several HVAC types are commonly used in various regions and building types [5]^[1]. The most common types of HVAC equipment, as denoted in the literature [5]^[1], are as follows:

Air-Conditioners (A/C): These are designed to cool the air in a space and include central air-conditioners, window units, or split systems. The control challenge involves precise temperature regulation while optimizing energy consumption, especially for central systems that need to account for the entire building’s thermal dynamics.
Heat Pumps: These pumps provide both heating and cooling by transferring heat energy from one place to another and include types like air-source, ground-source, and water-source pumps. The control challenge usually concerns the optimization of heat transfer, especially during transitional seasons when temperature differences are minimal.
Air-Handling Units (AHUs): These units condition and circulate air as part of an HVAC system and consist of components like blowers, heating or cooling elements, and filters. The control challenge lies in the coordination of these components to ensure optimal air circulation and conditioning while minimizing energy use.
Variable Air-Volume (VAV) Systems: These systems supply variable airflow rates to save energy and better control comfort. In order to potentially optimize their operation, the adjustment of airflow rates in real time based on occupancy and thermal demand is necessary.
Radiant Heating Devices: These devices transfer thermal energy for space heating through connections to boilers or operate using electricity. The control challenge is to maintain consistent heat output and ensure efficient heat transfer.
Boilers: These produce hot water or steam for heating, which is then circulated through pipes. The potential control challenge in this type of equipment usually concerns the preservation of the desired temperature and pressure, ensuring efficient fuel combustion.
Coolers: Evaporative coolers work by evaporating water to cool the air, which is effective in dry regions. For their efficient operation, it is necessary to optimize the evaporation process and manage water consumption.
Furnaces: These are high-temperature heating devices used for central heating. The control challenge is to achieve high-temperature heating without wasting fuel and ensure even distribution of heated air.
Multi-HVAC Systems: These systems integrate multiple types of HVAC equipment into a single framework, enabling zoning. Here, the control challenge is significantly more demanding than single-HVAC units. The coordination of various components to work harmoniously while considering the distinct thermal demands of different zones presents a significantly more complicated task.

2. Model-Free Applications in HVAC Control

This section discusses the numerous highly cited model-free research applications related to HVAC control optimization in an effort to categorize them into the aforementioned model-free HVAC control application sub-fields: reinforcement learning (RL); artificial neural network (ANN); fuzzy logic controller (FLC); hybrid (i.e., the integration of multiple model-free methodologies); and other applications that are not related to any of the aforementioned approaches. Along with analyzing the related highly cited applications from the period 2015–2023, this section also presents tables that contain summaries of each model-free control sub-field. To this end, Table 1, Table 2, Table 3, Table 4 and Table 5 contain the following features:

Reference: Denoted as Ref. in the first column of each table.
Year: The publication year of each research application.
Methodology: The specific RL/ANN/FLC/hybrid/other type of control methodology applied in the related work.
Agent: Indicates whether the applied control strategy utilizes a single- or multi-agent control philosophy.
HVAC: The specific HVAC equipment type of each application, as described in the published work. Air-conditioning is denoted as AC; heat pumps are denoted as heat pumps; radiant heating is denoted as radiators; cooling devices are denoted as coolers; variable air-volume equipment is denoted as VAV; air-handling units are denoted as AHUs; and multi-HVAC equipment frameworks integrating more than a single device for control are denoted as multi.
Single-zone: An “x” in this column indicates that the testbed application concerns a single-zone building control application.
Multi-zone: An “x” in this column indicates that the testbed application concerns a multi-zone building control application.
Simulation: An “x” in this column indicates that the testbed application concerns a simulation building control application.
Real-life: An “x” in this column indicates that the testbed application concerns a real-world or real-life building control application.
Residential: An “x” in this column indicates that the testbed application concerns a residential building control application.
Commercial: An “x” in this column indicates that the testbed application concerns a commercial building control application.
Citations: Indicates the number of citations of the related work according to Scopus.

The abbreviation “NaN” represents the “not identified” elements in Tables and Figures. In the following subsections, the integrated research applications are described regarding their motivations, their conceptual control methodologies, and their results.

2.1. Literature Review of Reinforcement Learning Control Applications

In 2015, Barrett et al., presented a pioneering study introducing a groundbreaking architecture for reinforcement learning (RL), with its primary application in creating an intelligent thermostat for HVAC systems [46]^[3]. This innovative framework was aimed at the autonomous regulation of HVAC systems, prioritizing the dual optimization of energy expenses and occupant comfort. To enable the successful deployment of reinforcement learning within HVAC control, the researchers proposed a unique formalization of the state-action space. Their findings highlighted the efficacy of the RL paradigm, which achieved up to a 10% cost reduction when benchmarked against a traditional programmable thermostat while ensuring superior levels of occupant comfort. An engaging research study conducted in 2017 by Wei et al. [47]^[4] scrutinized the fruitful application of deep reinforcement learning (DRL) in achieving optimal HVAC system control. This study leveraged the acclaimed EnergyPlus simulation framework, revealing that the DRL methodology surpassed the performance of conventional rule-based control (RBC) and traditional Q-learning RL approaches. Empirical evaluations utilizing precise EnergyPlus models, alongside real-world weather and pricing data, demonstrated the superior efficiency of the implemented DRL-based algorithms. These encompassed a standard DRL algorithm, as well as a heuristic adaptation catering to multi-zone control, both of which proved proficient in curtailing energy expenses while preserving a pleasant ambient temperature. Also in 2017, Wang et al. [48]^[5] focused on optimizing energy in buildings by controlling the HVAC system using a model-free actor–critic reinforcement learning (RL) controller with long short-term memory (LSTM) networks. The goal was to improve thermal comfort and energy consumption. The RL controller was tested in an office space model, resulting in an average improvement of 15% in thermal comfort and 2.5% in energy efficiency compared to traditional controls. The RL controller offered the possibility of implementing customized control in building HVAC systems with minimal human intervention. In a compelling 2018 study, Chen et al. [49]^[6] proposed the application of a model-free Q-learning RL control approach to optimize HVAC and window systems. The objective was to simultaneously attenuate energy consumption and thermal discomfort. At each time step, the control system assessed the indoor and outdoor environments, considering factors such as temperature, humidity, solar radiation, and wind velocity to formulate optimal control decisions congruent with current and future goals. The effectiveness of the approach was confirmed through illustrative simulation case studies in a hot and humid climate (Miami, USA) and a warm, temperate climate (Los Angeles, USA). The results demonstrated a notable edge over the RBC heuristic control strategy, achieving a reduction in energy consumption by 13% and 23%, a decrease in discomfort degree hours by 62% and 80%, and mitigation of high humidity hours by 63% and 77% in Miami and Los Angeles, respectively. Another interesting study was conducted by Chen et al. in 2019 [50]^[7], who considered the Gnu-RL methodology. Gnu-RL represents a novel approach to HVAC control that uses historical data from existing HVAC controllers in order to enable the practical deployment of reinforcement learning (RL) without prior information. Gnu-RL adopts a differentiable model predictive control policy for planning and system dynamics and uses imitation learning for pre-training. The agent then continues to improve its policy using a policy gradient algorithm. In the simulation experiment, Gnu-RL achieved a 6.6% reduction in energy consumption compared to the best-published RL result in the same environment while maintaining a higher level of occupant comfort. In both the simulation and real-world testbed experiments, Gnu-RL demonstrated up to a 16.7% decrease in cooling demand compared to the existing controller while maintaining occupant comfort. In noteworthy research from 2019, Valladares et al. [51]^[8] introduced a DRL algorithm that aimed to achieve an equilibrium between optimal thermal comfort and air quality levels while curtailing the energy demands of air-conditioning units and ventilation fans. The AI agent’s training was based on 10-year simulated data from a laboratory and classroom setting, with occupancy rates of up to 60 users. The adept RL agent balanced the requirements of thermal comfort, indoor air quality, and energy consumption, leading to enhanced outcomes. This included an improved predicted mean vote (PMV), an index that predicts people’s thermal sensation on a seven-point scale (+3, hot; +2, warm; +1, slightly warm; 0, neutral; −1, slightly cool; −2, cool; −3, cold). Furthermore, the AI-controlled system exhibited a 10% decrease in CO2 levels compared to the existing control system while demonstrating an energy reduction of 4–5%. Also in 2019, an important work by Liu et al. [52]^[9] introduced a novel deep deterministic policy gradient (DDPG) algorithm for short-term energy consumption prediction in HVAC systems. The study utilized an autoencoder (AE) to efficiently process raw data, enhancing the DDPG’s ability to identify important features and thereby improving the prediction model. Real-world data from the operation of a ground-source heat pump system in an office building in Henan, China, were exploited to train and test the models. The results illustrated that the DDPG-based models were well suited for short-term energy consumption forecasting, with MAE, RMSE, and Rsquare values of 3.858, 19.092, and 0.992, respectively. The integration of the autoencoder also proved beneficial. In comparison to traditional models such as backpropagation ANNs and support vector machines (SVMs), the AE-DDPG methodology enhanced prediction efficiency: the �2R² improved by more than 1.12%, whereas the MAE and RMSE were reduced by more than 22.46% and 25.96%, respectively. To address the challenges of low sample efficiency and safety-aware exploration in DRL techniques such as deep Q-networks (DQNs) applied to complex HVAC systems, Zhang et al. [53]^[10] proposed an RL approach that was tailored for learning system dynamics using an artificial neural network. The multifunctional HVAC equipment encompassed elements such as a heat exchanger, a chiller providing chilled water to the heat exchanger, a circulating air fan, the thermal space, connecting ductwork, dampers, and mixing air components. Control was conducted by a random-sampling shooting approach, and the proposed method was evaluated through simulation in a two-zone data center case study using the EnergyPlus tool (https://energyplus.net/ 2019). The results demonstrated a reduction in total energy consumption ranging from 17.1% to 21.8% compared to baseline approaches, achieving convergence rates 10x faster than other model-free RL approaches while maintaining an average deviation of trajectories sampled from learned dynamics below 20%. In 2019, Gao et al. [54]^[11] proposed an innovative RL framework known as DeepComfort. Its dual objectives were to optimize energy utilization while upholding thermal comfort in smart buildings through the application of deep reinforcement learning. The building’s thermal control was construed as a cost-minimization problem, considering both the HVAC system’s energy expenditure and the occupants’ thermal comfort. To accomplish this, the researchers deployed an FNN to predict the occupants’ thermal comfort. Additionally, they utilized deep deterministic policy gradients (DDPGs) to learn the thermal control policy. The validation of this approach was executed via a building thermal control simulation system, encompassing diverse scenarios. According to the evaluation, the implementation resulted in substantial improvements, including a 14.5% improvement in the prediction accuracy of thermal comfort, a 4.31% reduction in the HVAC system’s energy consumption, and a commendable 13.6% improvement in maintaining occupants’ comfort. Also in 2020, Azuatalam et al. [55]^[12] put forth a comprehensive framework encapsulating an efficient reinforcement learning (RL)–PPO controller for a holistic building model. This framework aimed at optimizing HVAC operations by improving energy efficiency and comfort, as well as achieving pertinent demand-response objectives. The multifunctional HVAC was equipped with a VAV, boiler, and chiller. The simulation results showed that the employment of RL for routine HVAC operations could culminate in a peak weekly energy reduction of up to 22% when compared to a manually constructed baseline controller. Furthermore, the adoption of a demand-response-aware RL controller during periods of demand response could potentially lead to average power decreases or increases of up to 50% on a weekly basis compared to a standard RL controller. In an interesting study in 2020, Zou et al. [56]^[13] introduced a strategy designed to optimize the control of air-handling units (AHUs) using long short-term memory (LSTM) networks. The aim was to replicate the functioning of real-world HVAC systems and create efficient deep reinforcement learning (DRL) training conditions. The plan also used advanced DRL algorithms such as deep deterministic policy gradients to achieve optimum control of AHUs. The model was tested on three AHUs, each with two years’ worth of building automation system (BAS) data. The hybrid LSTM-based DRL training environments, which were generated from the first year’s BAS data, achieved a highly accurate approximation of the AHU parameters. When deployed under test conditions created from the second year’s BAS data, the DRL agents managed to save 27–30% energy compared to the actual consumption while ensuring a low level of predicted discomfort. Also in 2020, Lork et al. [57]^[14] proposed a data-driven approach designed to address uncertainty in the control of split-type inverter air-conditioners (ACs) in residential buildings. The scientists compiled data from similar ACs and residential units to create balanced datasets and then used Bayesian convolutional neural networks (BCNNs) to model AC performance and uncertainties in these data. Next, a Q-learning-based reinforcement learning algorithm was utilized to make set-point decisions, using the BCNN models for transition sampling. An illustrative case study based on this framework was used to demonstrate the effectiveness of the approach. According to the experimental results, the controller aware of uncertainties performed better compared to conventional rule-based control (RBC), achieving a 7.69% improvement in discomfort measures and a 3.59% improvement in energy-saving potential. Another notable study from 2020 aimed at reducing the energy cost of a multi-zone commercial building’s HVAC system while considering random zone occupancy, thermal comfort, and indoor air quality comfort. However, this optimization problem was complex since it struggled with unknown thermal dynamics, parameter uncertainties, and constraints associated with indoor temperature and CO2 concentration. Moreover, the large discrete solution space and non-convex and non-separable objective function made it even more challenging. To this end, Yu et al. [58]^[15] reformulated the energy cost minimization problem as a Markov game and proposed an HVAC control algorithm based on multi-agent deep reinforcement learning with an attention mechanism. The proposed algorithm was able to operate without prior knowledge of uncertain parameters or building thermal dynamics models. The simulation results using real-world data showed that the proposed multi-agent deep reinforcement learning (MADRL) algorithm was effective, robust, scalable, and capable of reducing the total energy cost by 75.25% and 56.50% when compared to the rule-based scheme and heuristic control scheme while delivering an adequate comfort level for the occupants.

Table 1.

Summary of RL model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone

Summary of ANN model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Simulation	Real-Life	Agent	HVAC	Single-Zone	Residential	Commercial	Citations
Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[46]^[3]
[66]^[23]	2015	Q-learning	Single	2015	MLP/FNNNaN	x		x	Single	AHU		x	x		74
		x	99	[47]^[4]	2017	Q-learning	Single	VAV		x	x				246
[67]^[24]	2016	TDNN/FNN	Multi	Multi	x		x		x		104	[48]^[5]	2017	OPMCAC	Multi	VAV	x	x
[68]^[25]	2016	RandNN/FNN	Single	NaN	x		x	79
		x		x	67	[49]^[6]	2018
[69]^[26]	2016	Q-learning	Single	Multi
2019
LSTM/RNN	Single	NaN	x		x			x	60
[76]^[33]	2019	MLP/FNN	Single	NaN		x		x		x	60
[77]^[34]	2020	LSTM/RNN	Single	Heat Pump	x			x		x	67
[78]^[35]	2021	LSTM/RNN	Single	VAV	x		x			x	56

In another important work from 2017, Ahmad et al. [71]^[28] evaluated the performance of a common feedforward neural network (FNN) trained with backpropagation to estimate the hourly HVAC energy consumption of a hotel in Madrid, Spain. The optimization performance of the FNN was compared with that of the random forest (RF), a collective methodology increasingly utilized in forecasting. The inclusion of social variables like guest numbers slightly boosted predictive accuracy in both scenarios. According to the evaluation results based on the root-mean-square error (RMSE), mean absolute percentage error (MAPE), mean absolute deviation (MAD), coefficient of variation (CV), and �2R² metrics, the FNN surpassed the RF control across all metrics. Moreover, the results proved that both methodologies exhibited similar predictive accuracy, indicating they were almost equally viable for applications in building energy management. In 2018, Gonzales et al. [72]^[29] introduced an innovative multi-agent system (MAS) approach within a cloud-based ecosystem coupled with a wireless sensor array (WSN) to enhance HVAC energy efficiency. The entities within the MAS acquired social patterns through data assimilation and the application of an artificial neural network (ANN). Moreover, the system utilized sensor data to adapt to the building’s climate and occupancy and also incorporated weather forecasts and non-working periods to optimize HVAC operations. The approach allowed smoother temperature adjustments, reducing sudden shifts that escalated energy use. According to the case study evaluation, the strategy achieved an average energy saving of 41% in office spaces. The reduction in energy consumption was not linearly related to the difference between the outdoor and indoor temperatures. Also in 2018, Deb et al. [73]^[30] focused on the development of two data-driven forecasting tools for energy conservation linked to HVAC systems in commercial buildings in Singapore. Two predictive frameworks, multiple linear regression (MLR) and an artificial neural network (ANN), were formulated. The essence of the research revolved around choosing the optimal predictors, involving an extensive exploration of 819,150 permutations of 14 variables to pinpoint the most precise model. The main metric observed was the variance in energy use intensity (EUI) pre- and post-modification. The findings highlighted the efficiency of the ANN approach, which achieved a deviation rate of 14.8% compared to the multiple linear regression (MLR) approach. The same year, Kim et al. [74]^[31] introduced a cost-focused demand approach for multi-offices, balancing HVAC energy expenses with user comfort. A user-friendly digital platform was introduced, allowing users to express their comfort attributes. Consequently, these attributes were interpreted using neural computation methods and embedded into the demand regulation plan. The empirical findings confirmed the efficacy of the methodology in refining heat pump functionality and ensuring user comfort, thereby minimizing potential deviations from the ideal demand-response scheme. In an interesting work by Wang et al. in 2019 [75]^[32], a long short-term memory (LSTM) network—a special type of RNN—was employed to forecast diverse electrical burdens, illumination demands, occupant numbers, and intrinsic heat increments in two U.S. office buildings. Building A was located in Berkeley, was constructed in 2015, and occupied 6397 m2, whereas Building B was located in Philadelphia, was constructed in 1911, and occupied 6410 m2. The strength of LSTMs lies in their ability to remember patterns over time, which makes them well suited for time-series prediction tasks. According to the simulation data collected in 2018 and 2014, respectively, the LSTM control compared to the pre-established timetables suggested by ASHRAE guidelines implemented control strategies that reduced the prediction inaccuracies of internal heat increments from 12% to 8% in Building A and from 26% to 16% in Building B. In 2019, Peng et al. [76]^[33] showcased a design methodology and a regulation scheme with learning properties in order to enable HVAC systems to adjust to occupant thermal preferences under dynamically changing conditions. Four basic variables were utilized in order to generate datasets for the preference models: time, indoor weather conditions, outdoor weather conditions, and occupants’ behavior. An ANN framework, along with suitable hyperparameters, was trained on the different thermal preferences. For five months, the learning-based thermal preference control (LTPC) was applied to an HVAC system in single- and multi-user office spaces under real-life conditions. The results highlighted energy conservation of between 4% and 25% compared to the static temperature baselines. Moreover, the necessity of user intervention regarding temperature changes was decreased from 4–9 days per month to 1 day per month. An important work by Sendra et al. [77]^[34] introduced the design and development of an ANN predictor that was specifically designed to anticipate the next day’s power utilization of a building’s HVAC system. The highlighted HVAC system was located within MagicBox in Madrid, an actual self-sustaining solar-powered dwelling equipped with a surveillance mechanism. In order to model the predictor, multiple LSTM neural network architectures were proposed, along with appropriate data preparation methods, to refine the raw dataset. According to the evaluation, the LSTM networks achieved significant results, with test errors (NRMSE) held at 0.13 and a correlation of 0.797 between the predictions and actual test time series. The findings were compared with a simplified one-hour-ahead prediction that provided nearly optimal results, offering promising insights into real-time energy prediction in building structures. In 2021, notable research was conducted by Elmaz et al. [78]^[35], who presented a novel convolutional neural network–long short-term memory (CNN–LSTM) architecture, which integrated superior feature extraction and sequential learning capabilities for room temperature prediction. The control framework was built using a range of variables collected from a room at Antwerp University. The approach was compared to the conventional multi-layer perceptron (MLP) and standard LSTM approaches over prediction horizons of 1 to 120 min. Despite the efficient performance of all the concerned ANN frameworks at the 1-min horizon, the CNN–LSTM proved to be more stable and accurate in extended horizons, maintaining an �2R² > 0.9 over a 120 min. prediction horizon.

2.3. Literature Review of Fuzzy Logic Control Applications

In 2015, Saepullah et al. [79]^[36] investigated the use of three different fuzzy inference methods: Mamdani, Sugeno, and Tsukamoto. The research was based on experiments using various room temperature and humidity settings as inputs and compressor speeds as the output. In the first experiment, at 27 °C and 44% humidity, the Mamdani method resulted in energy savings of 34.9%. However, the Tsukamoto and Sugeno methods yielded greater energy savings of 58.099% and 73.8%, respectively. In the second experiment, with a room temperature of 33 °C and humidity of 68%, the Mamdani method resulted in energy savings of 13.8%. In comparison, the Tsukamoto method performed better, with energy savings of 31.176%. After comparing the results of all three methods, the researchers concluded that the Tsukamoto method was the most effective in terms of reducing electrical energy consumption, with average energy savings of 74.2775%. Also in 2015, Keshtkar et al. [80]^[37] proposed a methodology for utilizing fuzzy logic, along with wireless technology and smart grid bonuses, in order to eliminate energy wastage in domestic heating and cooling systems. Digitally controlled thermostats (PCTs) regulated these systems, aiming to cut energy usage while preserving daily routines according to demand-response (DR) programs, time-of-use (TOU), and real-time pricing (RTP) metrics. Since manually adjusting energy use for such a proposal represents a cumbersome procedure for household users, the fuzzy logic methodology was incorporated into the PCTs in order to enhance intelligence for reducing loads and safeguarding comfort levels. The PCT was replicated in the simulation software, acting as a test framework for the efficiency of the fuzzy logic method across various scenarios. According to the results, this approach efficiently controlled set points while maintaining comfort by employing specific rules based on data from sensors and smart incentives, thereby providing superior energy and cost-efficiency compared to the traditional PCT approach. Also in 2016, Ulpiani et al. [81]^[38] examined the energy and comfort outcomes of three distinct control strategies (binary, PID, and FLC) for managing a heating system in a green construction. Experiments were conducted in a test structure fitted with electric heaters and an array of sensors, monitoring the interior and exterior thermal states. Assessments were conducted in real time over a span of roughly a week during unoccupied, varied seasonal conditions in a Mediterranean environment. Each control strategy was evaluated for both comfort and energy efficiency and as the results indicated, the Mamdani fuzzy logic controller (FLC) outperformed the other control schemes, achieving energy savings of between 30 and 70% and consistently retaining adequate comfort parameters. Similarly, in 2017, Keshtkar et al. [82]^[39] proposed a solution for energy management in residential HVAC systems, with an adaptable, autonomous approach using supervised fuzzy logic learning (SFLL), wireless sensor capabilities, and dynamic electricity pricing to create a smart thermostat. In cases where user interaction affected system decisions, the adaptive fuzzy logic model (AFLM) was incorporated to adapt to new user preferences. To simulate a flexible residential environment, a house energy simulator was developed, incorporating an HVAC system, smart meter, and thermostat. The autonomous thermostat demonstrated a 21.3% energy-saving potential over a month of simulation, exhibiting its capability to modify daily temperature set points, thereby conserving energy and costs without sacrificing user comfort and requiring user intervention. Notably, in the case the user altered schedules or preferences, the AFLM illustrated a strong capacity for adaptation, learning from these alterations while maintaining energy efficiency In a 2018 study, Ain et al. [83]^[40] developed fuzzy inference systems (FISs) (both Mamdani and Sugeno), which used humidity and room temperature changes to optimize thermostat settings, thereby enhancing user comfort and energy efficiency. An automatic rule generation approach helped manage the complexity of the system. The lightweight FIS, compatible with IoT systems like RIOT, showed a reduction in energy consumption of 28% in simulations. Incorporating additional factors, such as outdoor temperature, occupancy, set points, and price tariffs, increased energy savings by up to 50% in the worst-case scenarios. According to the results, the methodology evaluated under various environmental conditions demonstrated that the Mamdani FIS performed better in warm climates, whereas the Sugeno FIS excelled in colder ones. In another important work from 2019, Li et al. [84]^[41] introduced a unique method for efficiently regulating indoor climates using real-time tracking of thermal user sensations. By employing a fuzzy logic assessment, cumulative thermal feedback from all occupants was established, which was processed by a linear algorithm to refine temperature guidelines. The objective was to instantly respond to occupants’ thermal preferences without requiring manual data. According to the evaluation, the proposed method could account for individual comfort levels, resulting in timely and accurate temperature modifications, while the participants reported higher comfort levels (score of 5.56) under the new system compared to a score of 5.10 using the traditional method. Significant energy savings were also achieved, demonstrating energy conservation of 20.5% for AHUs and 13.4% for water loops. Overall, there was a reduction of 13.8% in energy consumption compared to conventional approaches.

Table 3.

Summary of FLC model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[79
x
x

x		136

2.5. Literature Review of Other Model-Free Control Applications

In 2016, Cai et al. [93]^[50] introduced a versatile multi-agent control approach that was suitable for optimizing building energy systems in a “plug-and-play” fashion in order to reduce building-specific engineering efforts. To facilitate distributed decision making, two distinct consensus-based distributed optimization algorithms—a subgradient method and an alternating direction multiplier (ADMM) method—were adjusted and integrated within the framework. The overall approach was validated via simulations in two case studies: the optimization of a chilled water cooling plant and the ideal control of a direct-expansion (DX) air-conditioning system serving a multi-zone building. In both cases, the multi-agent controller effectively found near-optimal solutions, resulting in an overall energy savings potential of 42.7% compared to the baseline centralized conventional approach. In 2017, Wang et al. [94]^[51] addressed the complex issue of reducing the long-term cumulative cost of the HVAC system in a multi-zone commercial building within a smart grid framework. The total cost of the objective function was depicted as a combination of the energy expenses and the cost related to thermal discomfort. The paper formulated a stochastic program that considered various uncertainties, such as fluctuations in electricity prices, outdoor temperatures, preferred comfort levels, and external thermal disruptions. Moreover, the constraints involved were coupled both spatially and temporally, and the uncertainty of future parameters added further challenges. To address this problem, the authors introduced a real-time HVAC control algorithm that utilized Lyapunov optimization techniques. This innovative approach did not require predictions or specific knowledge about stochastic information, focusing instead on constructing and stabilizing virtual queues connected to indoor temperatures across various zones. The implementation of the proposed cost-aware real-time algorithm (CDRA) was distributed, emphasizing user privacy and improving scalability. Through extensive simulations based on real-world data, the study demonstrated that the introduced algorithm could effectively reduce energy costs by up to 52.43% while only minimally impacting thermal comfort. In 2017, Peng et al. [95]^[52] employed a k-nearest neighbor (k-NN) learning-based, demand-driven control strategy for sensible cooling aimed at predicting the occupants’ future presence and the duration of that presence for the rest of the day by learning from their past and current behaviors. The research approach integrated seven months of occupancy data from motion signals in six offices occupied by ten individuals in a commercial building, encompassing both private and multi-person offices. The predicted occupancy information was indirectly used to deduce setback temperature set points, based on specific rules outlined in the study. During a two-month period, both a baseline control and the innovative demand-driven control were deployed on forty-two real-world occupancy weekdays. According to the final evaluation, the use of demand-driven control led to an energy saving of 20.3% compared to the standard benchmark. Similarly, in 2018, Peng et al. [96]^[53] focused on enhancing the efficiency of HVAC systems by adapting them to occupants’ real-time behavior, specifically in office environments. Since rooms in office buildings are not continuously occupied during scheduled HVAC service times, there exists potential to reduce unnecessary energy usage linked with occupants’ actions. To address this, the study conducted a comprehensive analysis of occupants’ unpredictable behavior within an office building and proposed a demand-driven control strategy. This strategy automatically adapted to occupants’ energy-related actions to reduce energy consumption while maintaining room temperatures comparable to static cooling. The approach by Peng et al. included two kinds of machine learning techniques: unsupervised and supervised learning. These data-based approaches were adapted to occupants’ behavior in two distinct learning processes. The information gathered about occupancy was then utilized through a defined set of rules to deduce real-time room set points for managing the office space’s cooling system. This method aimed to minimize the need for human involvement in the control of the cooling system. The proposed strategy was put into practice for controlling the cooling system in real-world office settings, covering three typical office types—single-person offices, multi-person offices, and meeting rooms—across eleven case study office spaces. The experiments demonstrated energy savings ranging from 7% to 52% compared to traditionally scheduled cooling systems. In 2020, Li et al. [97]^[54] proposed a distributed multi-agent approach for optimal control for multi-zone ventilation systems. The approach focused on indoor air quality (IAQ) and energy use by optimizing individual room ventilation volumes and the primary air-handling unit (PAU). The complex optimization problem was divided into simpler parts and handled by distributed agents, each representing individual rooms and the PAU. Two control scenarios in varying external weather conditions were executed on a TRNSYS-MATLAB collaborative simulation platform to authenticate the suggested multi-agent-based decentralized control method by comparing it with a standard control method and a unified optimal control method. A coordinating agent integrated these agents to find the optimal solutions. The proposed approach was validated using two control test cases under different weather conditions, simulated in a TRNSYS-MATLAB environment. The results showed that the distributed approach could match the optimum output of the centralized control approach.

Table 5.

Summary of other model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
]^[36]	2015	Mamdani	Single	NaN	x		x
[93]^[50	51
^]	2016	ADMM	Multi	Multi		x	x			x	70	[80]^[37]	2015	NaN	Single	NaN	AHUx	x	x	x	58
x
[94]^[51]		2017	CDRA	Single	VAV	x	111
	x	x		[81]^[38]	2016	Mamdani	Single	Radiator	x			x	x		50	x	MLP/FNN
[82]^[39	Single	^]	NaN		2017	x	x	AFLM	Single	NaN	x		x			x	89
[83]^[40]	2018	Mamdani	Single	Radiator	x		x			x	58
[84]^[41]	2019	NaN	Single	VAV		x		x		x	50

2.4. Literature Review of Hybrid Model-Free Control Applications

In 2015, an interesting hybrid methodology was proposed by Hussain et al. [85]^[42] to support computational intelligence and optimization strategies by enhancing a fuzzy controller’s efficacy within an HVAC system. The goal was to moderate energy use without compromising the comfort of the inhabitants. It used EnergyPlus to compute the predicted mean vote (PMV) and predicted percentage dissatisfied (PPD) indices, whereas the fuzzy controller and the optimization framework were co-simulated using BCVTB and Simulink. These techniques were compared with EnergyPlus’s traditional thermal control of HVAC. The study concluded that a genetic algorithm (GA) could be used to fine-tune the fuzzy logic controller (FLC) to achieve an improved outcome. Compared to EnergyPlus, the PMV was reduced and the overall energy consumption decreased by 16.1% for cooling and 18.1% for heating. Also in 2015, Wei et al. [86]^[43] utilized an ensemble approach, combining a multi-layer perceptron with an ANN, to construct a comprehensive energy model for a building. The model incorporated three indoor air quality representations, encompassing the establishment temperature model, the establishment relative humidity model, and the establishment CO2 concentration model. To strike a balance between power usage and indoor air quality, a four-objective optimization problem was designed. This problem was addressed using a revised particle swarm optimization (PSO) algorithm, yielding control parameters for the supply air temperature and static pressure of the air management unit. By assigning varying weights to the objectives within the model, the derived control parameters optimized the HVAC system by trading off between power usage and the establishment of thermal comfort. According to simulated evaluations, the multi-layer perceptron (MLP) ensemble procedure demonstrated superior performance compared to seven other techniques and was chosen to form the comprehensive energy model and the trio of indoor air quality (IAQ) models. The total power conservation for the dataset examined in this paper was 17.4% when IAQ restrictions were not applied and 12.4% when IAQ limitations were imposed for one of the eight user preference scenarios. In 2016, Attaran et al. [87]^[44] suggested a novel approach for the energy optimization of HVAC systems using a combination of the radial basis function neural network (RBFNN) and the epsilon constraint (EC) PID approach. This innovative hybrid method leveraged the RBFNN in the HVAC system to predict residual discrepancies, amplify the control signal, and diminish errors. The main aim of this work was to design and test the EC-RBFNN for a self-adjusting PID controller tailored for a distinct bilinear HVAC system, focusing on temperature and humidity control. Comparative simulation case studies revealed the superior precision of the EC-RBFNN method over the standard PID optimization and combined PID-RBFNN. In 2017, a self-learning control strategy for HVAC systems was proposed by Ghahramani et al. [88]^[45] to adjust a building’s HVAC parameters for optimal efficiency and comfort. The specific system combined three key elements: (i) a metaheuristic element employing a k-nearest neighbor (k-NN) stochastic hill-climbing technique; (ii) a machine learning framework using a decision tree (DT) for regression analysis; and (iii) a self-tuning module that carries out a recursive brute-force search. The control strategy sets daily optimal parameters as its primary method of control, ensuring that it enhances, rather than disrupts, existing building management systems. To assess the performance of the novel strategy, Ghahramani et al. employed the reference model of a small office building from the U.S. Department of Energy across all U.S. climate zones. By simulating various control policies using the EnergyPlus software, the novel framework algorithm led to energy savings of 31.17% compared to standard operations (22.5 °C and 3 K). In terms of measurement accuracy across all climate zones, as defined by the normalized root-mean-square error, the algorithm demonstrated a performance score of 0.047. A data-driven neuro-fuzzy approach was presented in 2018 by Sala-Cardoso et al. [89]^[46] for the provision of short-term predictions concerning the HVAC thermal power demand in smart buildings. The innovation lies in estimating the building’s activity level to enhance the prediction system’s response and context awareness, thus increasing accuracy by factoring in the building’s usage pattern. The methodology combined a recurrent neural network (RNN), which learns the dynamics of a specially developed activity indicator, with an adaptive neuro-fuzzy inference system, which correlates activity predictions with outdoor and bus return temperatures to describe the building’s HVAC thermal power demand. An estimation method was also proposed for the indirect monitoring of the aggregated power consumption of the terminal units. Real data from a research building were experimentally utilized for the evaluation of the hybrid approach, and according to the results, a substantial performance enhancement was observed compared to the baseline methodologies, achieving a mean absolute error below 10%. In their 2019 research, Satrio et al. [90]^[47] evaluated the annual energy usage and thermal comfort in a university building equipped with radiant cooling and a variable air-volume (VAV) system, as measured by the predicted percentage of dissatisfied (PPD) value. A multi-goal optimization approach combining artificial neural networks (ANNs) and multi-objective genetic algorithms (MOGAs) was effectively used to determine the optimal operation of the building. The specifically designed ANN configuration demonstrated accurate predictions during the training phase, as indicated by a root-mean-square error (RMSE) of 0.3 for energy consumption and a PPD value of 1. The multi-objective optimization revealed substantial enhancements in the operation of the HVAC system in terms of thermal comfort while maintaining low annual energy consumption compared to the base design. Recent studies have shown that deep reinforcement learning holds great potential for controlling HVAC systems. However, its complex nature and slow computation limit its practical use in real-time HVAC optimal control. To address this issue, in 2019, Zhang et al. [91]^[48] proposed a practical RL control framework called BEM-DRL. The control framework was tested in a real-life application, considering a commercial office that integrated a novel radiant heating system. The control scheme concerned a four-step integration: building energy modeling, model calibration, deep considered learning training, and control deployment. The results of the 78-day real-life evaluation showed that the BEM-DRL framework achieved a 16.7% reduction in heating demand with more than 95% probability compared to the old RBC approach. In another hybrid control approach from 2022, Ren et al. [92]^[49] proposed a novel prediction-driven optimization strategy for real-time scheduling based on upcoming environmental patterns. Utilizing an advanced deep reinforcement learning technique (dueling DDQN), the home energy management system (HEMS) was dispatched optimally. Given the non-standard distribution of HVAC temperature data both indoors and outdoors, a unique generalized correntropy-assisted long short-term memory (GC-LSTM) neural model was presented. This model leveraged the generalized correntropy (GC) loss function for outdoor temperature forecasts. By implementing this technique in an HEMS scenario, the results revealed a notable decrease in user cost while maintaining user comfort.

Table 4.

Summary of hybrid model-free approaches for HVAC control (2015–2023).

Ref.	Year	Methodology	Agent	HVAC	Single-Zone	Multi-Zone	Simulation	Real-Life	Residential	Commercial	Citations
[85]^[42]	2015	GA-Fuzzy	Single	Multi	x		x			x	61
[86]^[43]	2015	PSO-ANN	Single		x	58
[87]^[44]	2016	EC-ANN	Single	AHU	x
[95]		^[52]	2017	x				54
kNN	Single	Multi	x			x		x	70		x		x	189
		[	x	88	277	]^[45]	2017	k-NN DTBF	Single	VAV		x	x
[96]^[53	x	^]	2018	kNN	Single	Cooler		61
x	x		x		199	[50]^[7]	2019	Gnu-RL	Single	VAV	x		x	x		x
x

[70]^[27	73
^]	[892017	]^[46RNN	Single	^]	2018	Neuro-FuzzyHeat Pump	x		x	Single	AHU	x		x		x	67			x	31	[51]^[8]	2019	DQN	Single	Cooler		x
[97]^[54]	2020	ADMM	Multi	AHU		x	xx			x	92
[
		x	50	[71]^[28]	2017	FNN	Single	NaN	x		x			x	544	[91]^[48]	2019	GA-RL	Single	Radiator	x			x		x	138	52]^[9]	2019	DDPG	Single	Heat Pump	x	x
[72		]^[29		^]		54
[	2018	MLP/FNN	Single	Multi	x		x	x		x	90]^[47]	2019	100	GA-Fuzzy	NaN	VAV		[53]^[10]	2019	PPO	Single	Multi		x
[73	x	]^[30		^]		x	64
2018	MLP/FNN	Single	AHU		x	x			x	51	[54]^[11]	2019	DDPG	Single	NaN	x		x				95
[74]^[31]	2018	MLP/FNN	Single	Heat Pump		x	x			x	[55]^[12]	2020	PPO/TRPO	Single	Multi	x		x	88
70		[56]^[13]	2020	DDPG	Single	AHU	x		x	x		x	73
[57]^[14]	2020	Q-learning	Multi	AC		x	x		x	x	46
[58]^[15]	2020	MAAC	Multi	AHU		x	x			x	114
[59]^[16]	2021	SAC/TD3	Single	Multi	x		x			x	47
[60]^[17]	2021	DDPG	Multi	Heat Pump		x	x	x			114
[61]^[18]	2021	DQN	Single	Radiator	x			x	x	x	52

A 2021 study by Biemann et al. [59]^[16] evaluated the effectiveness of four actor–critic algorithms in a simulated data center by assessing their ability to maintain thermal stability and enhance energy efficiency while adapting to weather changes. The focus was on data efficiency, given its practical importance. The performance of the actor–critic algorithms (SCA/TD3) was compared to PPO and TRPO policy-based approaches, as well as to a model-based controller used in EnergyPlus. According to the implementation, the HVAC was compromised by multiple components, including the air economizer, variable volume fan, direct–indirect evaporative cooler, cooling coil (in the west zone and a chilled water cooling coil in the east zone), and outdoor air damper. The results indicated that all the RL-applied algorithms were able to maintain the hourly average temperature within the desired range while reducing energy consumption by at least 10%. With increasing training, a smaller trade-off was observed between thermal stability and energy reduction. In a substantial contribution to the field in 2021, Du et al. [60]^[17] introduced a unique methodology to optimize multi-zone residential HVAC systems, employing a deep reinforcement learning (DRL) technique. The primary objective of their research was to minimize energy consumption costs while ensuring user comfort. The methodology, known as the deep deterministic policy gradient (DDPG), exhibited effectiveness in learning through ongoing interactions with a simulated building environment, devoid of any prior model knowledge. According to the simulation results, the DDPG-based HVAC control strategy surpassed the contemporary deep Q-network (DQN), achieving a reduction in the energy consumption cost of 15% and a significant decrease in the comfort violation of 79%. Furthermore, when juxtaposed against a rule-based HVAC control strategy, the DDPG-based strategy demonstrated remarkable efficacy, mitigating the comfort violation by an impressive 98%. In a 2021 study, Gupta et al. [61]^[18] contributed another intriguing approach to HVAC control. In this research, the authors introduced a deep reinforcement learning (DRL) heating controller designed to enhance thermal comfort while minimizing energy costs in smart buildings. The efficacy of the controller was rigorously assessed through comprehensive simulation experiments employing real-world outdoor temperature data. The results obtained through these evaluations corroborated the superiority of the proposed DRL-based controller over traditional thermostat controllers. This novel approach demonstrated improvements in thermal comfort ranging from 15% to 30%, alongside reductions in energy costs ranging from 5% to 12%. The study further extended its investigations to compare the performance of a centralized DRL-based controller with a decentralized configuration, where each heating unit possessed its own DRL-based controller. The empirical findings revealed that as the count of buildings and the variance in their set-point temperatures rose, the decentralized control configuration exhibited superior performance compared to its centralized counterpart. In 2022, by utilizing the proximal policy optimization (PPO) principles from reinforcement learning algorithms, Li et al. [62]^[19] employed a neural network to develop a comprehensive model for producing distinct control actions, specifically, thermostat adjustments. A novel method for minimizing the objective function was introduced to constrain the size of the update steps, thereby increasing the algorithm’s stability. As a result, a co-simulation platform for the thermal storage air-conditioning system was created, linking TRNSYS (https://www.trnsys.com/ 2022) and MATLAB (https://www.mathworks.com/products/matlab.html 2022). This research developed a demand-response strategy informed by time-of-use electricity pricing, considering elements like the environment, thermal comfort, and energy usage. The proposed RL algorithm was able to adapt to the thermostat adjustments during demand-response periods, and thus, the findings demonstrated efficiency in controlling temperature set points. Moreover, compared to a non-thermal storage air-conditioning system with a fixed set point, the approach resulted in an operational cost reduction of 9.17%, indicating the potential of the tool in optimizing HVAC systems. A study by Lei et al. [63]^[20] proposed a practical control framework based on DRL to integrate personalized thermal comfort and the presence of occupants. A branching dueling Q-network (BDQ), an advanced learning agent, was employed to effectively manage the complex, multi-dimensional control tasks associated with HVAC systems. Additionally, a method used to model personal comfort based on tabular data was incorporated, allowing for seamless integration into operations that involve human users. The BDQ agent was first trained in a simulated environment and then applied in a real office setting, where it applied five-dimensional optimization decisions. This real-world deployment allowed the collection of real-time comfort feedback from users, and according to the outlined advantages, resulted in a notable 14% decrease in energy usage for cooling and an 11% improvement in overall thermal comfort. In 2022, Deng et al. [64]^[21] proposed a novel non-stationary deep Q-network (DQN) methodology in order to address the dynamic behavior of HVAC systems in buildings. This methodology was able to identify the points at which the environment in a building altered and generate optimal control decisions for the HVAC system under these evolving conditions: the non-stationary DQN method was able to outperform the existing DQN method in both single- and multi-zone control efforts. Moreover, the simulation results revealed that the novel methodology was able to save up to 13% more energy and enhance thermal comfort by 9% compared to the conventional non-stationary deep Q-network methodology. Also in 2022, Yu et al. [65]^[22] proposed the synergistic control of personal comfort systems (PCSs) and a central HVAC setup in a co-working office environment, aiming at optimal energy usage and individual thermal comfort. The study was initiated by establishing an energy optimization challenge for both PCSs and the HVAC mechanism. Given the ambiguous nature of the thermal behavior and fluctuating variables of buildings, addressing this challenge was complex. Hence, Yu et al. transformed the issue into a Markovian game with diverse participants. By introducing an innovative real-time control strategy, the scientists utilized an attention-centric multi-agent deep learning methodology, bypassing the need for detailed thermal behavior models or preliminary data on variables. Practical simulations indicated the approach decreased energy usage by up to 4.18% and minimized thermal comfort variance by approximately 72.08% compared to existing baseline benchmarks.

2.2. Literature Review of Artificial Neural Network Control Applications

In noteworthy research from 2015, Huang et al. [66]^[23] proposed an ANN methodology for modeling multi-zone buildings, taking into account various energy inputs and thermal interactions between zones. This framework enabled accurate temperature predictions and reduced energy consumption. According to the results, the size of an ANN does not necessarily dictate its accuracy. The optimal ANN usually contains an order number no greater than four, as oversized networks may result in large prediction errors with high-frequency noise. The study further illustrated that the proposed multi-zone model exhibited faster computational speed than single-zone models, thereby enabling the development of more accurate and effective ANN-based predictive control. In 2016, Sholahudin et al. [67]^[24] proposed a strategy for forecasting the hourly heating load of a building, relying on various input parameter combinations via a dynamic ANN. The heating load of a standard apartment complex in Seoul was simulated over a winter month using the EnergyPlus software (https://energyplus.net/ 2016). The acquired datasets were then utilized to educate the time-delay neural network (TDNN) models. The Taguchi method was employed to explore the impact of individual input parameters on the heating load: dry-bulb temperature, dew-point temperature, direct normal radiation, diffuse-horizontal radiation, and wind velocity. The findings revealed that external temperature and wind velocity were the most impactful parameters, and the dynamic model yielded superior outcomes compared to the static model. To this end, the Taguchi method effectively curtailed the number of input parameters, and the dynamic ANN accurately forecasted immediate heating loads using a curtailed number of inputs. Also in 2016, Javed et al. [68]^[25] integrated a decentralized smart controller into an Internet of Things (IoT) framework coupled with cloud computing for the training of a random neural network (RandNN). The network assessed parameters such as temperature, humidity, HVAC airflow, and passive infrared sensor (PIR) data. The RandNN-based controller comprised three primary elements: a base station, sensor nodes, and cloud intelligence, each endowed with distinct functionalities. A sensor node with an embedded RandNN-based occupancy estimator approximated the number of occupants in the room and communicated this information to the base station. Accordingly, the base station, equipped with RandNN models, regulated the HVAC based on the set points for heating and cooling. The real-life implementation was compared to basic RBC controllers, illustrating the RandNN controller’s ability to reduce HVAC energy consumption by 27.12%. Also in 2016, Chae et al. [69]^[26] introduced a near-term building energy consumption prediction framework using an ANN enhanced with a Bayesian regularization algorithm. Predicting electricity consumption on a sub-hourly basis was challenging given the intricate consumption trends and data variability. The approach delved into the impact of network design parameters like time lag, hidden neuron count, and training dataset on the model’s performance and adaptability. According to the simulation findings in three urban office buildings, the developed model was able to accurately predict electricity use in 15-min increments and daily peak consumption in a commercial building cluster test scenario by utilizing adaptive training techniques. In a 2017 research effort, Chen et al. [70]^[27] proposed a data-based methodology that established a loop for precise predictive modeling and real-time regulation of building thermal dynamics. The approach was based on a deep recurrent neural network (RNN), which made use of substantial amounts of sensor data. The trained RNN was subsequently incorporated directly into a finite horizon-constrained optimization issue. To transform constrained optimization into an unconstrained optimization problem, the scientists employed an iterative gradient descent approach with momentum to determine optimal control inputs. The simulation results revealed that the proposed method enhanced performance compared to the model-based approach in both building system modeling and control. According to the results, the RNN approach identified a series of control decisions sufficient to reduce energy usage by 30.74%. Conversely, the solution identified by the RC model provided a mere 4.07% reduction in energy consumption measures.

Table 2.

[

]

³²

References

Seyam, S. Types of HVAC Systems. In HVAC System; InTech Open: London, UK, 2018; pp. 49–66. Available online: https://www.intechopen.com/chapters/62059 (accessed on 12 October 2023).
American Society of Heating, Refrigerating and Air-Conditioning Engineers, Inc. HVAC Systems and Equipment; American Society of Heating, Refrigerating, and Air Conditioning Engineers: Atlanta, GA, USA, 1996; Volume 39.
Barrett, E.; Linder, S. Autonomous hvac control, a reinforcement learning approach. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, 7–11 September 2015; Proceedings, Part III 15. Springer: Berlin/Heidelberg, Germany, 2015; pp. 3–19.
Wei, T.; Wang, Y.; Zhu, Q. Deep reinforcement learning for building HVAC control. In Proceedings of the 54th Annual Design Automation Conference, Austin, TX, USA, 18 June 2017; pp. 1–6.
Wang, Y.; Velswamy, K.; Huang, B. A long-short term memory recurrent neural network based reinforcement learning controller for office heating ventilation and air conditioning systems. Processes 2017, 5, 46.
Chen, Y.; Norford, L.K.; Samuelson, H.W.; Malkawi, A. Optimal control of HVAC and window systems for natural ventilation through reinforcement learning. Energy Build. 2018, 169, 195–205.
Chen, B.; Cai, Z.; Bergés, M. Gnu-rl: A precocial reinforcement learning solution for building hvac control using a differentiable mpc policy. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019; pp. 316–325.
Valladares, W.; Galindo, M.; Gutiérrez, J.; Wu, W.C.; Liao, K.K.; Liao, J.C.; Lu, K.C.; Wang, C.C. Energy optimization associated with thermal comfort and indoor air control via a deep reinforcement learning algorithm. Build. Environ. 2019, 155, 105–117.
Liu, T.; Xu, C.; Guo, Y.; Chen, H. A novel deep reinforcement learning based methodology for short-term HVAC system energy consumption prediction. Int. J. Refrig. 2019, 107, 39–51.
Zhang, C.; Kuppannagari, S.R.; Kannan, R.; Prasanna, V.K. Building HVAC scheduling using reinforcement learning via neural network based model approximation. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019; pp. 287–296.
Gao, G.; Li, J.; Wen, Y. Energy-efficient thermal comfort control in smart buildings via deep reinforcement learning. arXiv 2019, arXiv:1901.04693.
Azuatalam, D.; Lee, W.L.; de Nijs, F.; Liebman, A. Reinforcement learning for whole-building HVAC control and demand response. Energy AI 2020, 2, 100020.
Zou, Z.; Yu, X.; Ergan, S. Towards optimal control of air handling units using deep reinforcement learning and recurrent neural network. Build. Environ. 2020, 168, 106535.
Lork, C.; Li, W.T.; Qin, Y.; Zhou, Y.; Yuen, C.; Tushar, W.; Saha, T.K. An uncertainty-aware deep reinforcement learning framework for residential air conditioning energy management. Appl. Energy 2020, 276, 115426.
Yu, L.; Sun, Y.; Xu, Z.; Shen, C.; Yue, D.; Jiang, T.; Guan, X. Multi-agent deep reinforcement learning for HVAC control in commercial buildings. IEEE Trans. Smart Grid 2020, 12, 407–419.
Biemann, M.; Scheller, F.; Liu, X.; Huang, L. Experimental evaluation of model-free reinforcement learning algorithms for continuous HVAC control. Appl. Energy 2021, 298, 117164.
Du, Y.; Zandi, H.; Kotevska, O.; Kurte, K.; Munk, J.; Amasyali, K.; Mckee, E.; Li, F. Intelligent multi-zone residential HVAC control strategy based on deep reinforcement learning. Appl. Energy 2021, 281, 116117.
Gupta, A.; Badr, Y.; Negahban, A.; Qiu, R.G. Energy-efficient heating control for smart buildings with deep reinforcement learning. J. Build. Eng. 2021, 34, 101739.
Li, Z.; Sun, Z.; Meng, Q.; Wang, Y.; Li, Y. Reinforcement learning of room temperature set-point of thermal storage air-conditioning system with demand response. Energy Build. 2022, 259, 111903.
Lei, Y.; Zhan, S.; Ono, E.; Peng, Y.; Zhang, Z.; Hasama, T.; Chong, A. A practical deep reinforcement learning framework for multivariate occupant-centric control in buildings. Appl. Energy 2022, 324, 119742.
Deng, X.; Zhang, Y.; Qi, H. Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning. Build. Environ. 2022, 211, 108680.
Yu, L.; Xu, Z.; Zhang, T.; Guan, X.; Yue, D. Energy-efficient personalized thermal comfort control in office buildings based on multi-agent deep reinforcement learning. Build. Environ. 2022, 223, 109458.
Huang, H.; Chen, L.; Hu, E. A neural network-based multi-zone modelling approach for predictive control system design in commercial buildings. Energy Build. 2015, 97, 86–97.
Sholahudin, S.; Han, H. Simplified dynamic neural network model to predict heating load of a building using Taguchi method. Energy 2016, 115, 1672–1678.
Javed, A.; Larijani, H.; Ahmadinia, A.; Emmanuel, R.; Mannion, M.; Gibson, D. Design and implementation of a cloud enabled random neural network-based decentralized smart controller with intelligent sensor nodes for HVAC. IEEE Internet Things J. 2016, 4, 393–403.
Chae, Y.T.; Horesh, R.; Hwang, Y.; Lee, Y.M. Artificial neural network model for forecasting sub-hourly electricity usage in commercial buildings. Energy Build. 2016, 111, 184–194.
Chen, Y.; Shi, Y.; Zhang, B. Modeling and optimization of complex building energy systems with deep neural networks. In Proceedings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 29 October–1 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1368–1373.
Ahmad, M.W.; Mourshed, M.; Rezgui, Y. Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy Build. 2017, 147, 77–89.
González-Briones, A.; Prieto, J.; De La Prieta, F.; Herrera-Viedma, E.; Corchado, J.M. Energy optimization using a case-based reasoning strategy. Sensors 2018, 18, 865.
Deb, C.; Lee, S.E.; Santamouris, M. Using artificial neural networks to assess HVAC related energy saving in retrofitted office buildings. Sol. Energy 2018, 163, 32–44.
Kim, Y.J. Optimal price based demand response of HVAC systems in multizone office buildings considering thermal preferences of individual occupants buildings. IEEE Trans. Ind. Inform. 2018, 14, 5060–5073.
Wang, Z.; Hong, T.; Piette, M.A. Data fusion in predicting internal heat gains for office buildings through a deep learning approach. Appl. Energy 2019, 240, 386–398.
Peng, Y.; Nagy, Z.; Schlüter, A. Temperature-preference learning with neural networks for occupant-centric building indoor climate controls. Build. Environ. 2019, 154, 296–308.
Sendra-Arranz, R.; Gutiérrez, A. A long short-term memory artificial neural network to predict daily HVAC consumption in buildings. Energy Build. 2020, 216, 109952.
Elmaz, F.; Eyckerman, R.; Casteels, W.; Latré, S.; Hellinckx, P. CNN-LSTM architecture for predictive indoor temperature modeling. Build. Environ. 2021, 206, 108327.
Saepullah, A.; Wahono, R.S. Comparative analysis of mamdani, sugeno and tsukamoto method of fuzzy inference system for air conditioner energy saving. J. Intell. Syst. 2015, 1, 143–147.
Keshtkar, A.; Arzanpour, S.; Keshtkar, F.; Ahmadi, P. Smart residential load reduction via fuzzy logic, wireless sensors, and smart grid incentives. Energy Build. 2015, 104, 165–180.
Ulpiani, G.; Borgognoni, M.; Romagnoli, A.; Di Perna, C. Comparing the performance of on/off, PID and fuzzy controllers applied to the heating system of an energy-efficient building. Energy Build. 2016, 116, 1–17.
Keshtkar, A.; Arzanpour, S. An adaptive fuzzy logic system for residential energy management in smart grid environments. Appl. Energy 2017, 186, 68–81.
Ain, Q.u.; Iqbal, S.; Khan, S.A.; Malik, A.W.; Ahmad, I.; Javaid, N. IoT operating system based fuzzy inference system for home energy management system in smart buildings. Sensors 2018, 18, 2802.
Li, W.; Zhang, J.; Zhao, T. Indoor thermal environment optimal control for thermal comfort and energy saving based on online monitoring of thermal sensation. Energy Build. 2019, 197, 57–67.
Hussain, S.; Gabbar, H.A.; Bondarenko, D.; Musharavati, F.; Pokharel, S. Comfort-based fuzzy control optimization for energy conservation in HVAC systems. Control Eng. Pract. 2014, 32, 172–182.
Wei, X.; Kusiak, A.; Li, M.; Tang, F.; Zeng, Y. Multi-objective optimization of the HVAC (heating, ventilation, and air conditioning) system performance. Energy 2015, 83, 294–306.
Attaran, S.M.; Yusof, R.; Selamat, H. A novel optimization algorithm based on epsilon constraint-RBF neural network for tuning PID controller in decoupled HVAC system. Appl. Therm. Eng. 2016, 99, 613–624.
Ghahramani, A.; Karvigh, S.A.; Becerik-Gerber, B. HVAC system energy optimization using an adaptive hybrid metaheuristic. Energy Build. 2017, 152, 149–161.
Sala-Cardoso, E.; Delgado-Prieto, M.; Kampouropoulos, K.; Romeral, L. Activity-aware HVAC power demand forecasting. Energy Build. 2018, 170, 15–24.
Satrio, P.; Mahlia, T.M.I.; Giannetti, N.; Saito, K. Optimization of HVAC system energy consumption in a building using artificial neural network and multi-objective genetic algorithm. Sustain. Energy Technol. Assess. 2019, 35, 48–57.
Zhang, Z.; Chong, A.; Pan, Y.; Zhang, C.; Lam, K.P. Whole building energy model for HVAC optimal control: A practical framework based on deep reinforcement learning. Energy Build. 2019, 199, 472–490.
Ren, M.; Liu, X.; Yang, Z.; Zhang, J.; Guo, Y.; Jia, Y. A novel forecasting based scheduling method for household energy management system based on deep reinforcement learning. Sustain. Cities Soc. 2022, 76, 103207.
Cai, J.; Kim, D.; Jaramillo, R.; Braun, J.E.; Hu, J. A general multi-agent control approach for building energy system optimization. Energy Build. 2016, 127, 337–351.
Wang, W.; Chen, J.; Huang, G.; Lu, Y. Energy efficient HVAC control for an IPS-enabled large space in commercial buildings through dynamic spatial occupancy distribution. Appl. Energy 2017, 207, 305–323.
Peng, Y.; Rysanek, A.; Nagy, Z.; Schlüter, A. Occupancy learning-based demand-driven cooling control for office spaces. Build. Environ. 2017, 122, 145–160.
Peng, Y.; Rysanek, A.; Nagy, Z.; Schlüter, A. Using machine learning techniques for occupancy-prediction-based cooling control in office buildings. Appl. Energy 2018, 211, 1343–1358.
Li, W.; Wang, S. A multi-agent based distributed approach for optimal control of multi-zone ventilation systems considering indoor air quality and energy use. Appl. Energy 2020, 275, 115371.