Deep Reinforcement Learning-Based BEMS per Building Type

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Ayas Mahr Shaqour	--	3743	2022-11-27 14:29:45	\|
2	format correct	Catherine Yang	Meta information modification	3743	2022-11-28 04:04:24	\|

This entry is adapted from the peer-reviewed paper 10.3390/en15228663

The deep reinforcement learning (DRL)-based building energy management systems (BEMS) field has grown rapidly in the last five years, with numerous creative ideas and innovations for integrating advanced data-driven control methods in the development of fully enabled smart buildings. Although residential buildings are by far the largest energy consumers, other building types, such as offices and educational buildings, are also being investigated. It would be useful to realize the different directions of research, types of applications, and innovative ideas being implemented for each building type. In particular, it is crucial from a data-centric perspective, as being able to train and use data-driven methods requires large amounts of data, particularly when deploying such systems in the real world.

building energy demand deep reinforcement learning data-driven control energy demand prediction energy efficiency residential building office building commercial building data centre

1. Residential Buildings

As previously mentioned, residential buildings account for almost 22% of global energy demand, making them one of the most energy-consuming building types. Additionally, while some types of commercial buildings are primarily used during the day by employees, especially in the post-COVID-19 era, they can be more grid-friendly from the perspective of being more aligned with solar energy availability depending on the work culture of the country. It is primarily because there is more solar energy during the day, whereas residential energy demand increases after work hours, peaking in the evening, which can be a sensitive period for grid operators to compensate for the supply demand change. This could be one underlying factor why this type of building is receiving significant attention in this field, particularly from a DR perspective. Table 1 presents recent research conducted on DRL-based BEMS in residential buildings.

Table 1. Recent application of DRL-based BEMS on residential buildings.

Ref	Year	Building Study Scale	BEMS	ESS	PV	DR	DRL	Estimator	Unique Objective	Real System	Energy */Cost Saving
^[1]	2022	Single	HVAC	x	x	o	DQN	DNN	-	-	19.40%
^[2]	2022	Single	HVAC	x	x	o	DQN, DDPG	DNN	-	√	25.9–32%
^[3]	2022	Single	HVAC, EV, Appliances	o	o	o	ACKTR	Kronecker-Factored	-	-	25.37% ^▲
^[4]	2022	Single	HVAC	o	o	o	Clustering-DDPG	DNN	-	-	41%
^[5]	2022	Single	Appliances	x	o	o	DQN	DNN	Peak demand	-	30%
^[6]	2022	Single	HVAC, WHP	x	o	x	DDQN	DNN	Health	-	7–60% *^,▲
^[7]	2022	Single	HVAC, EV, Appliances	o	x	o	A2C	DNN	-	-	23%
^[8]	2022	Single	HVAC, Appliances	o	o	o	MDRL	DNN	-	-	25.80%
^[9]	2022	Single	HVAC	x	x	x	DDQN	DNN	Health	-	23.80% ^▲
^[10]	2022	Single	HVAC, EV, Appliances	o	x	o	DQN	DNN	-	-	21.30%
^[11]	2021	Single	HVAC	x	x	o	DQN	DNN	-	-	19.48%
^[12]	2021	Single	WHP	x	x	o	DQN	DNN	-	-	19–35%
^[13]	2021	Single	HVAC	x	x	o	DDQN-PER	DNN	Health	-	3.51–8.56%
^[14]	2021	Single	HVAC	x	x	o	DDPG	DNN	-	-	12.7–50% ^▲
^[15]	2021	Single	HVAC, EV, Appliances	o	o	o	TD3, DQN, DPG	DNN	-	-	5.93–12.45%
^[16]	2020	Single	Appliances	x	x	o	DQN	CNN	Peak demand	-	11.66% ^▲
^[17]	2020	Single	HVAC	x	x	o	DQN	DNN	-	-	43.89%
^[18]	2020	Single	HVAC, Battery	o	o	o	DDPG	DNN	-	-	8.10–15.21%
^[19]	2022	Single	HVAC	x	x	x	RLMPC vs. (DDQN, MPC)	DNN	-	-	-
^[20]	2022	Single	WHP, EV	o	o	o	DDPG	DNN	-	√	30% *
^[21]	2021	Single	TES	o	o	o	REINFORCE	DNN	-	-	50%
^[22]	2021	Single	HVAC	x	x	x	REINFORCE	Monte-Carlo PG	-	-	13–64%
^[23]	2020	Single	HVAC	x	x	o	DQN	DNN	-	√	21% ^▲, 30%
^[24]	2020	Multi.	HVAC	x	x	x	DQN	BCNN	-	-	53% *
^[25]	2022	Multi	CHP, Boiler	x	x	o	DRLEM	-	-	-	3.30%
^[26]	2022	Multi	HVAC, Appliances, Battery	o	o	o	A2C	DNN	Peak	-	5–35%
^[27]	2022	Multi	HVAC, WHP, Appliances	o	o	o	SAC	DNN	-	-	3–7%
^[28]	2021	Multi	TES, Battery	o	o	x	MARLISA-_DACC	DNN	Emissions	-	-
^[29]	2021	Multi	HVAC	x	x	x	DQN	DNN	-	-	5–12%
^[30]	2020	Multi	TES	o	o	o	SAC	DNN	Peak	-	-
^[31]	2018	Multi	HVAC, EV, Appliances	x	o	o	DQN, DDPG	DNN	Peak	-	27.40%

o: included, x: not included, * Energy saving, ^▲ Over all energy/cost saving (Others are an improvement over a baseline Controller). Table Abbreviations: (ACKTR) Actor-critic kronecker-factored trust region; (A2C) advantage actor-critic; (BCNN) Bayesian-Convolutional-Neural-Networks; (CNN) Convolutional neural network; (DDQN-PER) Double deep Q-learning prioritized experience replay; (DNN) Deep neural network; (EV) Electric vehicle; (RLMPC) Reinforcement Learning Model Predictive Control; (SAC) Soft actor-critic; (TES) Thermal energy storage; (TD3) Twin Delayed DDPG; (WHP) Water heating pump.

As indicated in Table 1, DRL-based BEMS research can consider one or multiple buildings to measure the performance of DRL algorithms under different scenarios or to test a multiple-agent DRL approach for managing energy flow, considering multiple buildings or zones simultaneously ^[27]. Glatt et al. introduced a decentralized actor-critic reinforcement learning algorithm MARLISA; however, they focused on integrating a centralized critic (MARLISA_DACC) to coordinate energy storage systems (ESS) control, such as batteries and thermal energy storage (TES), between various buildings in a manner that enhances DR performance and reduces carbon footprints ^[28]. With the increase in the scale of residential buildings, multiple-agent approaches can learn to share information and act in a positively correlated manner to maximize the BEMS performance over single-agent approaches. Ahrarinouri et al. utilized a distributed reinforcement learning energy management (DRLEM) to control the energy flow of combined heat and power (CHP) and boilers between multiple buildings, where the connection between the multiple agents reduced the heat losses and costs by 18.3% and 3.3%, respectively, and increased energy sharing in peak time by 23% ^[25]. Hence, distributed, and multi-agent approaches will be key methods in further research on residential neighbourhoods and buildings, where renewable energy and EV can be coordinated between different houses to reduce renewable energy curtailment and maximize profits in peer-to-peer local energy trading hubs.

The large variety of appliances and BEMS targets are major opportunities in deploying DRL-based BEMS in residential buildings, and there is a high potential for DR because of their contribution to both morning and evening peak demands ^[32], and detached houses having space for renewable energy integration. In the recently reviewed literature in Table 1, 77% of the studies considered demand response systems, where the varying electricity price was integrated into the objectives of the control logic, while 42% and 45% had also considered the integration of ESS and PV renewable energy, respectively. Furthermore, while 74% of the systems were deployed to manage HVAC systems related to BEMS targets, 32% of the studies included different types of shiftable/fixable appliances, and 19% investigated the inclusion of electric vehicles (EVs). Table 1 classifies the general BEMS target systems in residential buildings, while Table 2 includes a detailed list of appliances that were directly controlled, apart from HVAC systems and TE; noticeable appliances include dishwashers, washing machines, and EVs. The diversity of BEMS targets in residential buildings is noticeable and considerably high, giving it a unique potential and research perspective. This is probably related to the fact that homeowners might have higher relative demand flexibility than office buildings; for example, owing to direct cost benefits. The operating environment tends to have higher levels of stress and no direct benefits to individuals to compromise their comfort, where the benefit is for business owners.

Table 2. Residential appliances controlled using DRL-based BEMS.

Appliance	#No.	Reference
HVAC	19	^[2]^[3]^[4]^[6]^[7]^[8]^[10]^[11]^[13]^[14]^[15]^[17]^[18]^[23]^[24]^[26]^[27]^[29]^[31]
Washing machine	8	^[3]^[5]^[7]^[8]^[10]^[15]^[16]^[26]
Dish washer	8	^[3]^[5]^[7]^[8]^[10]^[15]^[16]^[31]
Electric vehicle (EV)	6	^[3]^[7]^[10]^[15]^[20]^[31]
Water heating pump (WHP)	5	^[6]^[9]^[12]^[20]^[27]
Underfloor heating	2	^[19]^[33]
Clothes dryer	2	^[3]^[16]
Vacuum cleaner	1	^[16]
Passive heating and cooling	1	^[22]
Boiler	1	^[25]
Light	1	^[10]
Ventilation	1	^[13]
Grinder	1	^[16]

For the DRL methods, the most-utilized algorithm was DQN, while DDQN and DDPG were notable. Many studies include a comparison between the different types of DRL to determine the best method based on realizing system objectives. Meanwhile, others investigated hybrid methods, such as the mixed deep reinforcement learning (MDRL) introduced by Huang et al. ^[8], which combines both DQN and DDPG for enhanced performance, and the RLMPC implemented by Arroyo et al. ^[19] which combines both the MPC and DDQN methods in a manner that leverages the benefits of both methods. Two recent unique variations of DRL were also observed. First, the actor-critic approach using the Kronecker-factored trust region (ACKTR) introduced by Chu et al. ^[3] increased the sampling efficiency and integrated discrete and continuous action spaces that exhibited high potential. The second algorithm is a combination of clustering and DDPG developed by Zenginis et al., which homogeneously partitions the training data using a clustering method and then trains different agents of each subset of the training data, achieving higher energy efficiency over a single agent ^[4]. While these methods are not directly related to the type of building, exhibiting such methods can aid researchers in choosing recently advanced implementations of DRL on the basis of their application and building type. Finally, DNNs have been the most used value/policy function estimators, whereas very few used other methods, such as CNN. In general, owing to the mixed type of state variables, DNNs can effectively map state–action spaces and can be considered the default estimator; however, this indicates that there can be potential for testing other methods.

The primary objectives of most BEMS systems are typically the same in terms of comfort and reducing energy/cost. In terms of energy and cost, they are highly correlated, where a reduction in one depicts a reduction in the other, although different studies report their primary objective improvements in terms of energy or cost based on whether DR is considered; hence, the price of energy analysis is included. Other secondary objectives, highlighted by some studies, include health factors such as indoor CO₂ levels, and the reduction of peak demand, which usually refers to the improvement over a rule-based baseline controller or a comparison between single and multiple-agent methods. Hence, the high energy-saving percentages do not necessarily depict the overall energy reduction, making it harder to cross-compare studies based on these numbers. Nevertheless, they highlight the advantages of energy savings in residential buildings utilizing DRL. Finally, real implementations are significantly lacking, with only three studies (<10%) out of 31 having validated their models outside of a simulation environment, which highlights a clear research gap.

2. Office Buildings

Office buildings face the challenge of a limited variety of appliances apart from HVAC systems, mainly because they are located in cities and high-rise buildings with limited space for installing renewable energy. While keeping these facts in perspective, the recent application of DRL-based BEMS in offices can be observed in Table 3.

Table 3. Recent applications of DRL-based BEMS in office buildings.

Ref	Year	BEMS	ESS	PV	DR	DRL	Estimator	Unique Objective	Real System	Energy */Cost Savings
^[34]	2022	CHP, Battery, PV	o	o	x	DDPG	DNN	-	-	-
^[35]	2022	HVAC	x	x	x	DQN	DNN	-	-	-
^[36]	2022	HVAC, TES	o	o	o	SAC	DNN	Self-consumption/Sufficiency	-	39.5–84.3%
^[37]	2022	HVAC	x	x	x	PPO	DNN	-	-	48.97% *
^[38]	2022	HVAC, PCSs	x	x	x	MAAC		-	-	0.7–4.18% *^,▲
^[39]	2022	Chiller, TES	o	x	o	SAC	DNN	Discomfort	-	-
^[40]	2022	Battery, fan coil units	o	o	o	Dueling DQN	DNN	Discomfort	-	8%
^[41]	2022	HVAC	x	x	x	A3C	DNN	-	-	16.10% *
^[42]	2022	HVAC	x	x	x	A3C	DNN	-	-	12.80% *
^[43]	2022	HVAC	x	x	x	BDQ	DNN	-	-	14% *^,▲
^[44]	2022	HVAC	x	x	x	PPO, A2C	DNN	Discomfort	-	4–22%*
^[45]	2022	HVAC	x	x	x	DQN	DNN	Emissions	-
^[46]	2021	HVAC	x	x	x	DQN	DNN	-	-	6% *
^[47]	2021	HVAC	x	x	x	DQN	DNN	Health	-
^[48]	2021	HVAC	x	x	o	SAC	DNN	-	-	9.70%
^[49]	2021	HVAC, Blind	x	x	x	BDQN, SAC, PPO	DNN	-	-	11.0–31.8%
^[50]	2021	EV	x	o	o	PPO	DNN	-	-	62.5% ^▲
^[51]	2021	HVAC	x	x	x	PPO	DNN	-	-	4.5–13.2%
^[52]	2021	HVAC	x	x	x	SAC	DNN	Temperature violation	-	-
^[53]	2021	HVAC, Battery	o	o	o	DDPG	DNN	-	-	39.60%
^[54]	2020	HVAC	x	x	x	DQN	DNN	Health	-	15.70% *
^[55]	2020	Water Heating	x	x	x	DDQN	DNN	-	-	5–12% ^▲
^[56]	2020	HVAC, Battery, EV, EWH	o	o	o	DQN	DNN	-	-	-
^[57]	2020	HVAC	x	x	x	DDPG	DNN	-	-	27–30% ^▲
^[58]	2019	HVAC, Light, Blind	x	x	x	BDQ	DNN	-	-	8.1–14.26%
^[59]	2019	HVAC	x	x	x	DQN	DNN	-	-	12.4–32.2% *
^[60]	2019	HVAC	x	x	x	A3C	DNN	-	√	16.70% *
^[61]	2018	HVAC	x	x	x	A3C	DNN	-	√	16.6–18.2% *
^[62]	2018	HVAC	x	x	x	A3C	DNN	-	√	15% ^▲

o: included, x: not included, * Energy saving, ^▲ Over all energy/cost saving (Others are an improvement over a baseline Controller). Table Abbreviations: (A3C) Asynchronous advantage actor-critic; (BDQ) Branching-Dueling Q-network; (CHP) Combined heat and power; (EWH) Electric water heater; (MAAC) Multi-agent actor-critic; (PCS) Personal comfort systems.

The number of recent office building-related studies is comparable to that of residential buildings. The first difference can be noticed when observing the appliance category type, which is primarily related to HVAC systems. Only two studies investigated EVs, while few other control targets were investigated, such as TES, blind control, light control, and personal comfort systems (PCSs). HVAC systems are the main energy consumers in offices and have the flexibility and potential to save energy. In addition to HVAC control, recent innovations can be found for BEMS integrated with EVs. Liang et al. included EVs in their BEMS that utilized a safe reinforcement learning (SRL) strategy to mitigate the effect of extreme weather events and increase building resilience and proactivity ^[56]. Meanwhile, Mbuwir et al. used EVs as their core and only a BEMS target in an office building, which revealed that by utilizing a multi-agent DRL; specifically, a promising saving potential of up to 62.5% can be achieved ^[50]. Furthermore, it can be noticed that only 24% of research considered DR systems, and only 21% included PV or energy storage systems.

The methods of DRL utilized in office buildings are more diversified than those observed in residential buildings, including the asynchronous advantage actor-critic (A3C) and the soft-actor critic (SAC), where their comparison has indicated improved performance over baseline, rule-based controllers, although one downside is that their comparison to other DRL has not always been considered. Zhang et al. introduced a branching–dueling Q-network (BDQN) and compared it to both PPO and SAC, where they reported that BDQN converged to the highest reward, followed by SAC, revealing higher sample complexity than their counterpart, although they performed slower than PPO, and consumed less memory. Hence, this revealed a trade between time, RAM usage, and reward. Another comparison between the advantage actor-critic (A2C) and PPO was conducted by Lee et al., where A2C exhibited better performance ^[44]. Such a comparison is useful in guiding researchers to choose the best subset of algorithms from the current large pool of DRL algorithms.

A critical observation related to office buildings is the significance of indoor thermal comfort in realizing the high productivity of workers. This can be observed in four studies that highlighted the reduction in discomfort or temperature violations as a system objective. Because there is less DR inclusion in the BEMS, a higher number of studies have reported energy savings rather than cost savings in comparison to residential buildings. Finally, only three studies conducted by Zhang et al. implemented and validated their models in real systems ^[60].

3. Educational Buildings

As depicted in Table 4, which shows recent research on educational buildings, they are mainly either schools or university facilities and laboratories. The target of the BEMS primarily focused on HVAC systems, and one study investigated TES control and other ventilation systems by controlling windows and air cleaners. Only two recent works included demand response systems with integrated energy storage, mainly TES. As for the objectives, health was considered by An et al., who deployed DQN to control ventilation in two laboratory rooms to achieve reduced economic loss and PM_2.5-related health risks ^[63]. This is an interesting co-benefit perspective to quantify not only energy and cost reduction, but also to quantify the impact on human health and integrate the findings into the BEMS objective. Furthermore, Chemingui et al. included the reduction of indoor contamination as a core target of their BEMS. This was realized by optimizing the HVAC system managing 21 zones in a school model, achieving 44% increased thermal comfort, 21% reduction in energy consumption, and low indoor CO₂ concentration ^[64]. Considering real implementations, three studies conducted real model validation: one in a laboratory setting, one in a university building, and another in a school setting. Laboratories are suitable for real-system validation, although acquiring data to train the agent can be challenging if the data does not already exist. In An et al., the approach was first to conduct an offline training phase based on an apartment model coupled with particle dynamics for PM_2.5 modelling, after which the trained agent was tested in a laboratory room with different PM_2.5 ^[63]. Schmidt et al. conducted a 43-day experiment in a Spanish school by deploying a BEMS utilizing a fitted Q-iteration and Bayesian regularized neural network coupled with genetic optimization. They confirmed that by maintaining comfort levels similar to the reference period, energy consumption decreased by almost 33%, and while prioritizing higher comfort, only a 5% energy increase was observed ^[65].

Table 4. Recent applications of DRL-based BEMS in educational buildings.

Ref	Year	Type	Scale	BEMS	ESS	PV	DR	DRL	Estimator	Unique Objective	Real System	Energy */Cost Savings
^[66]	2022	University	Single	HVAC	o	x	o	PPO-Clip	DNN	-	-	9.17%
^[67]	2022	University	Multi	TES	o	x	o	SAC	DNN	Load-Factor	-	6.72%
^[63]	2022	University	Lab.	Ventilation	x	x	x	DQN	DNN	Health	√	2.4–43.7%
^[68]	2022	University	Single	HVAC	x	x	x	SAC	DNN	-	√	-
^[69]	2021	University	Multi	HVAC	x	x	x	DDPG	DNN	-	-	15.40% *^,▲
^[64]	2020	School	Single	HVAC	x	x	x	DDPG	DNN	Health	-	21% *^,▲
^[70]	2020	University	Single	HVAC	x	x	x	PPO	DNN	-	-	10.80% *
^[65]	2017	School	Single	HVAC	x	x	x	fitted Q-iteration	-	-	√	33% *^,▲

o: included, x: not included, * Energy saving, ^▲ Overall energy/cost saving (others are an improvement over a baseline controller).

Finally, a recent innovative idea introduced by Zhou et al. combines DRL with deep learning for building energy prediction. It was not included in Table 4 because it is indirectly related to the BEMS. They utilized DDPG to add an additional learning layer to an LSTM forecaster by having the agent learn to tune the hyperparameters of the LSTM as new training data arrive. They demonstrated that when there is a high variation in the new training data, the prediction accuracy can be increased by up to 23.5% ^[71].

4. Datacenters

As listed in Table 5, few studies have investigated data centres. It was observed that the BEMS does not consider DR, renewable energy, or storage systems and is primarily focused on HVAC systems. In general, the main objective of the BEMS is to lower energy demand while meeting operational constraints, while comfort can be slightly compromised in other building types. As a system target, the operational efficiency of data centers is more sensitive as it can compromise the data center’s main operation.

Table 5. Recent applications of DRL-based BEMS in datacenters.

Ref	Year	BEMS	ESS	PV	DR	DRL	Estimator	Unique Objective	Real System	Overall Energy Saving
^[72]	2022	HVAC	x	x	x	SAC	DNN	Operation	-	3–5.5%
^[73]	2022	HPC/AI Cluster	x	x	x	DQN	DNN	Operation	√	40%
^[74]	2021	HVAC	x	x	x	SAC, PPO, TD3, TRPO	DNN	Operation	-	10%
^[75]	2019	HVAC	x	x	x	DQN	DNN	Operation	-	-
^[76]	2019	HVAC	x	x	x	Model-Based DRL, PPO	DNN	Operation	-	17.1–21.8%

o: included, x: not included. Table Abbreviations: (TRPO) Trust Region Policy Optimization.

One unique study implemented by Narantuya et al. utilized a multi-agent DRL (mDRL) based on a DQN to optimize computational resource allocation in high-performance computing (HPC)/AI systems. Their system was further deployed in real-time, reducing the task completion time by 20% and the energy consumption by 40% ^[73]. Finally, Beimann et al. conducted a comparative analysis of four different DRL methods for the control of a simulated HVAC system of a data centre. Their computational experimental results revealed that SAC has exceptionally high sample efficiency, reaching stable performance with 10 times less data required in comparison to PPO, TRP, and TD3; hence, it is recommended for future utilization, particularly in noisy environments. Moreover, it was reported that all models can achieve an energy reduction of approximately 10% in comparison to a baseline controller ^[74].

5. Other Commercial Buildings

Finally, Table 6 includes commercial buildings that are not classified as educational, offices or data centres. Such types of buildings are introduced as either commercial buildings, storehouses, industrial parks, or a mix of (retail and restaurant buildings, offices, and residential) ^[77]^[78].

Table 6. Recent applications of DRL-based BEMS in other commercial buildings.

Ref	Year	Scale	BEMS	ESS	PV	DR	DRL	Estimator	Unique Objective	Real System	Energy */Cost Savings
^[79]	2022	Single	HVAC	x	x	x	MA-CWSC, DQN	DNN	-	-	11.10% *
^[80]	2022	Storehouse	HVAC	x	x	x	DDQN	DNN	-	-	34.20% *
^[81]	2022	Industrial Park	HVAC	x	x	o	Dueling SAC	DNN	-	-	2.80% ^▲
^[77]	2022	Multi	HVAC, WHP, Inverter, Battery	o	o	o	PPO	DNN	Over/Under voltage	-	-
^[82]	2022	Single	HVAC	x	x	x	DDQN	DNN	-	-	50% *
^[83]	2021	Single	HVAC	x	x	o	MAAC	DNN	Health	-	56.50–75.25%
^[78]	2021	Multi	HVAC, TES	o	o	o	SAC	DNN	-	-	7% *, 4%
^[84]	2021	Multi	HVAC, TES	o	o	o	SAC	DNN	Peak	-	23% ^▲
^[85]	2020	Single	HVAC	x	x	o	A3C, Apex-DQN	DNN	-	-	-
^[86]	2019	Single	HVAC	x	x	o	PPO	DNN	-	-	22% *

o: included, x: not included, * Energy saving, ^▲ Over all energy/cost saving (others are an improvement over a baseline controller). Table Abbreviations: (MA-CWSC) Multi-Agent deep reinforcement learning method for the building Cooling Water System Control.

All of the studies listed in Table 6 investigated HVAC systems as the main BEMS target, while two studies included TES and one considered WHP and renewable energy inverters. DR systems were also included in seven studies, particularly in those with larger scales, such as industrial parks or multiple buildings. One notable method introduced was the dueling SAC-based memory-augmented DRL by Zhao et al. to overcome the limitation of time lag in district heating systems in an industrial park. Their novel methodology reduced the energy costs by 2.8% ^[81]. Furthermore, two multi-agent approaches were observed. First, Fu et al. utilized a multi-agent DRL method for developing a cooling water system control (MA-CWSC) to control the frequency of the cooling tower and cooling water pump in many chillers. Compared with the single-agent DQN, the proposed model had faster training and simpler action space, resulting in an 11.1% energy saving over the rule-based baseline ^[79]. Second, Yu et al. introduced a multi-agent actor-critic (MAAC) algorithm for a multi-zone HVAC system. Their objective was not only to minimize energy costs but also reduce the indoor CO₂ concentration in the building ^[83].

In terms of secondary objectives, Pigott et al. considered voltage regulations for a simulated IEEE-33 bus connected to nine buildings. The building types are diverse and include 37 fast-food restaurants, four medium offices, five retail stores, a mall, and 145 residential houses. These models were based on the recent CityLearn framework, which is a platform dedicated to multi-agent models in smart grids, and hence contains both building and power-flow models. Utilizing multiple DRL agents, their model nominally reduced the under-voltage instances and overvoltage occurrences by 34% ^[77]. Moreover, Pinto et al. considered both peak demand and peak-to-average ratio, which were reduced by 23% and 20%, respectively, by using a centralized SAC agent controlling four different building types (small/medium offices, retail, and restaurant). Finally, in terms of real system validation, none was observed ^[84].

References

Blad, C.; Bøgh, S.; Kallesøe, C.S. Data-Driven Offline Reinforcement Learning for HVAC-Systems. Energy 2022, 261, 125290.
Du, Y.; Li, F.; Kurte, K.; Munk, J.; Zandi, H. Demonstration of Intelligent HVAC Load Management with Deep Reinforcement Learning: Real-World Experience of Machine Learning in Demand Control. IEEE Power Energy Mag. 2022, 20, 42–53.
Chu, Y.; Wei, Z.; Sun, G.; Zang, H.; Chen, S.; Zhou, Y. Optimal Home Energy Management Strategy: A Reinforcement Learning Method with Actor-Critic Using Kronecker-Factored Trust Region. Electr. Power Syst. Res. 2022, 212, 108617.
Zenginis, I.; Vardakas, J.; Koltsaklis, N.E.; Verikoukis, C. Smart Home’s Energy Management Through a Clustering-Based Reinforcement Learning Approach. IEEE Internet Things J. 2022, 9, 16363–16371.
Lu, J.; Mannion, P.; Mason, K. A Multi-Objective Multi-Agent Deep Reinforcement Learning Approach to Residential Appliance Scheduling. IET Smart Grid 2022, 5, 260–280.
Heidari, A.; Maréchal, F.; Khovalyg, D. Reinforcement Learning for Proactive Operation of Residential Energy Systems by Learning Stochastic Occupant Behavior and Fluctuating Solar Energy: Balancing Comfort, Hygiene and Energy Use. Appl. Energy 2022, 318, 119206.
Shuvo, S.S.; Yilmaz, Y. Home Energy Recommendation System (HERS): A Deep Reinforcement Learning Method Based on Residents’ Feedback and Activity. IEEE Trans. Smart Grid 2022, 13, 2812–2821.
Huang, C.; Zhang, H.; Wang, L.; Luo, X.; Song, Y. Mixed Deep Reinforcement Learning Considering Discrete-Continuous Hybrid Action Space for Smart Home Energy Management. J. Mod. Power Syst. Clean Energy 2022, 10, 743–754.
Heidari, A.; Maréchal, F.; Khovalyg, D. An Occupant-Centric Control Framework for Balancing Comfort, Energy Use and Hygiene in Hot Water Systems: A Model-Free Reinforcement Learning Approach. Appl. Energy 2022, 312, 118833.
Forootani, A.; Rastegar, M.; Jooshaki, M. An Advanced Satisfaction-Based Home Energy Management System Using Deep Reinforcement Learning. IEEE Access 2022, 10, 47896–47905.
Kurte, K.; Amasyali, K.; Munk, J.; Zandi, H. Comparative analysis of model-free and model-based HVAC control for residential demand response. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Coimbra, Portugal, 17–18 November 2021; ACM: New York, NY, USA, 2021; pp. 309–313.
Amasyali, K.; Munk, J.; Kurte, K.; Kuruganti, T.; Zandi, H. Deep Reinforcement Learning for Autonomous Water Heater Control. Buildings 2021, 11, 548.
Yang, T.; Zhao, L.; Li, W.; Wu, J.; Zomaya, A.Y. Towards Healthy and Cost-Effective Indoor Environment Management in Smart Homes: A Deep Reinforcement Learning Approach. Appl. Energy 2021, 300, 117335.
Liu, B.; Akcakaya, M.; McDermott, T.E. Automated Control of Transactive HVACs in Energy Distribution Systems. IEEE Trans. Smart Grid 2021, 12, 2462–2471.
Ye, Y.; Qiu, D.; Wang, H.; Tang, Y.; Strbac, G. Real-Time Autonomous Residential Demand Response Management Based on Twin Delayed Deep Deterministic Policy Gradient Learning. Energy 2021, 14, 531.
Mathew, A.; Roy, A.; Mathew, J. Intelligent Residential Energy Management System Using Deep Reinforcement Learning. IEEE Syst. J. 2020, 14, 5362–5372.
McKee, E.; Du, Y.; Li, F.; Munk, J.; Johnston, T.; Kurte, K.; Kotevska, O.; Amasyali, K.; Zandi, H. Deep reinforcement learning for residential HVAC control with consideration of human occupancy. In Proceedings of the 2020 IEEE Power & Energy Society General Meeting (PESGM), Montreal, QC, Canada, 2–6 August 2020; pp. 1–5.
Yu, L.; Xie, W.; Xie, D.; Zou, Y.; Zhang, D.; Sun, Z.; Zhang, L.; Zhang, Y.; Jiang, T. Deep Reinforcement Learning for Smart Home Energy Management. IEEE Internet Things J. 2020, 7, 2751–2762.
Arroyo, J.; Manna, C.; Spiessens, F.; Helsen, L. Reinforced Model Predictive Control (RL-MPC) for Building Energy Management. Appl. Energy 2022, 309, 118346.
Svetozarevic, B.; Baumann, C.; Muntwiler, S.; di Natale, L.; Zeilinger, M.N.; Heer, P. Data-Driven Control of Room Temperature and Bidirectional EV Charging Using Deep Reinforcement Learning: Simulations and Experiments. Appl. Energy 2022, 307, 118127.
Zsembinszki, G.; Fernández, C.; Vérez, D.; Cabeza, L.F.; Cannavale, A.; Martellotta, F.; Fiorito, F. Deep Learning Optimal Control for a Complex Hybrid Energy Storage System. Buildings 2021, 11, 194.
Park, B.; Rempel, A.R.; Lai, A.K.L.; Chiaramonte, J.; Mishra, S. Reinforcement Learning for Control of Passive Heating and Cooling in Buildings. IFAC-Papers 2021, 54, 907–912.
Kurte, K.; Munk, J.; Kotevska, O.; Amasyali, K.; Smith, R.; McKee, E.; Du, Y.; Cui, B.; Kuruganti, T.; Zandi, H. Evaluating the Adaptability of Reinforcement Learning Based HVAC Control for Residential Houses. Sustainability 2020, 12, 7727.
Lork, C.; Li, W.T.; Qin, Y.; Zhou, Y.; Yuen, C.; Tushar, W.; Saha, T.K. An Uncertainty-Aware Deep Reinforcement Learning Framework for Residential Air Conditioning Energy Management. Appl. Energy 2020, 276, 115426.
Ahrarinouri, M.; Rastegar, M.; Karami, K.; Seifi, A.R. Distributed Reinforcement Learning Energy Management Approach in Multiple Residential Energy Hubs. Sustain. Energy Grids Netw. 2022, 32, 100795.
Lee, S.; Choi, D.H. Federated Reinforcement Learning for Energy Management of Multiple Smart Homes with Distributed Energy Resources. IEEE Trans. Ind. Inf. 2022, 18, 488–497.
Pinto, G.; Kathirgamanathan, A.; Mangina, E.; Finn, D.P.; Capozzoli, A. Enhancing Energy Management in Grid-Interactive Buildings: A Comparison among Cooperative and Coordinated Architectures. Appl. Energy 2022, 310, 118497.
Glatt, R.; da Silva, F.L.; Soper, B.; Dawson, W.A.; Rusu, E.; Goldhahn, R.A. Collaborative energy demand response with decentralized actor and centralized critic. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Coimbra, Portugal, 17–18 November 2021; ACM: New York, NY, USA, 2021; pp. 333–337.
Gupta, A.; Badr, Y.; Negahban, A.; Qiu, R.G. Energy-Efficient Heating Control for Smart Buildings with Deep Reinforcement Learning. J. Build. Eng. 2021, 34, 101739.
Kathirgamanathan, A.; Twardowski, K.; Mangina, E.; Finn, D.P. A Centralised soft actor critic deep reinforcement learning approach to district demand side management through CityLearn. In Proceedings of the Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities, Online, 17 November 2020; ACM: New York, NY, USA, 2020; pp. 11–14.
Mocanu, E.; Mocanu, D.C.; Nguyen, P.H.; Liotta, A.; Webber, M.E.; Gibescu, M.; Slootweg, J.G. On-Line Building Energy Optimization Using Deep Reinforcement Learning. IEEE Trans. Smart Grid 2019, 10, 3698–3708.
Torriti, J.; Zhao, X.; Yuan, Y. The Risk of Residential Peak Electricity Demand: A Comparison of Five European Countries. Energies 2017, 10, 385.
Denyer, D.; Tranfield, D.; van Aken, J.E. Developing Design Propositions through Research Synthesis. Organ. Stud. 2008, 29, 393–413.
Gao, Y.; Matsunami, Y.; Miyata, S.; Akashi, Y. Operational Optimization for Off-Grid Renewable Building Energy System Using Deep Reinforcement Learning. Appl. Energy 2022, 325, 119783.
Fang, X.; Gong, G.; Li, G.; Chun, L.; Peng, P.; Li, W.; Shi, X.; Chen, X. Deep Reinforcement Learning Optimal Control Strategy for Temperature Setpoint Real-Time Reset in Multi-Zone Building HVAC System. Appl. Eng. 2022, 212, 118552.
Brandi, S.; Gallo, A.; Capozzoli, A. A Predictive and Adaptive Control Strategy to Optimize the Management of Integrated Energy Systems in Buildings. Energy Rep. 2022, 8, 1550–1567.
Zhang, T.; Aakash Krishna, G.S.; Afshari, M.; Musilek, P.; Taylor, M.E.; Ardakanian, O. Diversity for transfer in learning-based control of buildings. In Proceedings of the Thirteenth ACM International Conference on Future Energy Systems, Online, 28 June–1 July 2022; ACM: New York, NY, USA, 2022; pp. 556–564.
Yu, L.; Xu, Z.; Zhang, T.; Guan, X.; Yue, D. Energy-Efficient Personalized Thermal Comfort Control in Office Buildings Based on Multi-Agent Deep Reinforcement Learning. Build. Environ. 2022, 223, 109458.
Brandi, S.; Fiorentini, M.; Capozzoli, A. Comparison of Online and Offline Deep Reinforcement Learning with Model Predictive Control for Thermal Energy Management. Autom. Constr. 2022, 135, 104128.
Shen, R.; Zhong, S.; Wen, X.; An, Q.; Zheng, R.; Li, Y.; Zhao, J. Multi-Agent Deep Reinforcement Learning Optimization Framework for Building Energy System with Renewable Energy. Appl. Energy 2022, 312, 118724.
Zhang, W.; Zhang, Z. Energy Efficient Operation Optimization of Building Air-Conditioners via Simulator-Assisted Asynchronous Reinforcement Learning. IOP Conf. Ser. Earth Environ. Sci 2022, 1048, 012006.
Zhong, X.; Zhang, Z.; Zhang, R.; Zhang, C. End-to-End Deep Reinforcement Learning Control for HVAC Systems in Office Buildings. Designs 2022, 6, 52.
Lei, Y.; Zhan, S.; Ono, E.; Peng, Y.; Zhang, Z.; Hasama, T.; Chong, A. A Practical Deep Reinforcement Learning Framework for Multivariate Occupant-Centric Control in Buildings. Appl. Energy 2022, 324, 119742.
Lee, J.Y.; Rahman, A.; Huang, S.; Smith, A.D.; Katipamula, S. On-Policy Learning-Based Deep Reinforcement Learning Assessment for Building Control Efficiency and Stability. Sci. Technol. Built Environ. 2022, 28, 1150–1165.
Marzullo, T.; Dey, S.; Long, N.; Leiva Vilaplana, J.; Henze, G. A High-Fidelity Building Performance Simulation Test Bed for the Development and Evaluation of Advanced Controls. J. Build. Perform. Simul. 2022, 15, 379–397.
Verma, S.; Agrawal, S.; Venkatesh, R.; Shrotri, U.; Nagarathinam, S.; Jayaprakash, R.; Dutta, A. EImprove—Optimizing energy and comfort in buildings based on formal semantics and reinforcement learning. In Proceedings of the 58th ACM/IEEE Design Automation Conference (DAC), Online, 5–9 December 2021; pp. 157–162.
Jneid, K.; Ploix, S.; Reignier, P.; Jallon, P. Deep Q-network boosted with external knowledge for HVAC control. In Proceedings of the 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Coimbra, Portugal, 17–18 November 2021; ACM: New York, NY, USA, 2021; pp. 329–332.
Kathirgamanathan, A.; Mangina, E.; Finn, D.P. Development of a Soft Actor Critic Deep Reinforcement Learning Approach for Harnessing Energy Flexibility in a Large Office Building. Energy AI 2021, 5, 100101.
Zhang, T.; Baasch, G.; Ardakanian, O.; Evins, R. On the joint control of multiple building systems with reinforcement learning. In Proceedings of the Twelfth ACM International Conference on Future Energy Systems, Online, 28 June–1 July 2021; ACM: New York, NY, USA, 2021; pp. 60–72.
Mbuwir, B.v.; Vanmunster, L.; Thoelen, K.; Deconinck, G. A Hybrid Policy Gradient and Rule-Based Control Framework for Electric Vehicle Charging. Energy AI 2021, 4, 100059.
Zhang, X.; Chintala, R.; Bernstein, A.; Graf, P.; Jin, X. Grid-interactive multi-zone building control using reinforcement learning with global-local policy search. In Proceedings of the American Control Conference (ACC), Online, 25–28 May 2021; pp. 4155–4162.
Coraci, D.; Brandi, S.; Piscitelli, M.S.; Capozzoli, A. Online Implementation of a Soft Actor-Critic Agent to Enhance Indoor Temperature Control and Energy Efficiency in Buildings. Energies 2021, 14, 997.
Touzani, S.; Prakash, A.K.; Wang, Z.; Agarwal, S.; Pritoni, M.; Kiran, M.; Brown, R.; Granderson, J. Controlling Distributed Energy Resources via Deep Reinforcement Learning for Load Flexibility and Energy Efficiency. Appl. Energy 2021, 304, 117733.
Ahn, K.U.; Park, C.S. Application of Deep Q-Networks for Model-Free Optimal Control Balancing between Different HVAC Systems. Sci. Technol. Built Environ. 2020, 26, 61–74.
Brandi, S.; Piscitelli, M.S.; Martellacci, M.; Capozzoli, A. Deep Reinforcement Learning to Optimise Indoor Temperature Control and Heating Energy Consumption in Buildings. Energy Build. 2020, 224, 110225.
Liang, Z.; Huang, C.; Su, W.; Duan, N.; Donde, V.; Wang, B.; Zhao, X. Safe Reinforcement Learning-Based Resilient Proactive Scheduling for a Commercial Building Considering Correlated Demand Response. IEEE Open Access J. Power Energy 2021, 8, 85–96.
Zou, Z.; Yu, X.; Ergan, S. Towards Optimal Control of Air Handling Units Using Deep Reinforcement Learning and Recurrent Neural Network. Build. Environ. 2020, 168, 106535.
Ding, X.; Du, W.; Cerpa, A. OCTOPUS: Deep reinforcement learning for holistic smart building control. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019; ACM: New York, NY, USA, 2019; pp. 326–335.
Yoon, Y.R.; Moon, H.J. Performance Based Thermal Comfort Control (PTCC) Using Deep Reinforcement Learning for Space Cooling. Energy Build. 2019, 203, 109420.
Zhang, Z.; Chong, A.; Pan, Y.; Zhang, C.; Lam, K.P. Whole Building Energy Model for HVAC Optimal Control: A Practical Framework Based on Deep Reinforcement Learning. Energy Build. 2019, 199, 472–490.
Zhang, Z.; Lam, K.P. Practical implementation and evaluation of deep reinforcement learning control for a radiant heating system. In Proceedings of the 5th Conference on Systems for Built Environments, Shenzen, China, 7–8 November 2018; ACM: New York, NY, USA, 2018; pp. 148–157.
Zhang, Z.; Chong, A.; Pan, Y.; Zhang, C.; Lu, S.; Lam, K. A Deep reinforcement learning approach to using whole building energy model for HVAC optimal control. In Proceedings of the ASHRAE/IBPSA-USA Building Performance Analysis Conference and SimBuild, Chicago, IL, USA, 26–28 September 2018.
An, Y.; Niu, Z.; Chen, C. Smart Control of Window and Air Cleaner for Mitigating Indoor PM2.5 with Reduced Energy Consumption Based on Deep Reinforcement Learning. Build. Environ. 2022, 224, 109583.
Chemingui, Y.; Gastli, A.; Ellabban, O. Reinforcement Learning-Based School Energy Management System. Energies 2020, 13, 6354.
Schmidt, M.; Moreno, M.V.; Schülke, A.; Macek, K.; Mařík, K.; Pastor, A.G. Optimizing Legacy Building Operation: The Evolution into Data-Driven Predictive Cyber-Physical Systems. Energy Build. 2017, 148, 257–279.
Li, Z.; Sun, Z.; Meng, Q.; Wang, Y.; Li, Y. Reinforcement Learning of Room Temperature Set-Point of Thermal Storage Air-Conditioning System with Demand Response. Energy Build. 2022, 259, 111903.
Qin, Y.; Ke, J.; Wang, B.; Filaretov, G.F. Energy Optimization for Regional Buildings Based on Distributed Reinforcement Learning. Sustain. Cities Soc. 2022, 78, 103625.
Jung, S.; Jeoung, J.; Hong, T. Occupant-Centered Real-Time Control of Indoor Temperature Using Deep Learning Algorithms. Build. Environ. 2022, 208, 108633.
Li, J.; Zhang, W.; Gao, G.; Wen, Y.; Jin, G.; Christopoulos, G. Toward Intelligent Multizone Thermal Control with Multiagent Deep Reinforcement Learning. IEEE Internet Things J. 2021, 8, 11150–11162.
Naug, A.; Quiñones-Grueiro, M.; Biswas, G. Continual adaptation in deep reinforcement learning-based control applied to non-stationary building environments. In Proceedings the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities, Online, 17 November 2020; ACM: New York, NY, USA, 2020; pp. 24–28.
Zhou, X.; Lin, W.; Kumar, R.; Cui, P.; Ma, Z. A Data-Driven Strategy Using Long Short Term Memory Models and Reinforcement Learning to Predict Building Electricity Consumption. Appl. Energy 2022, 306, 118078.
Bin Mahbod, M.H.; Chng, C.B.; Lee, P.S.; Chui, C.K. Energy Saving Evaluation of an Energy Efficient Data Center Using a Model-Free Reinforcement Learning Approach. Appl. Energy 2022, 322, 119392.
Narantuya, J.; Shin, J.S.; Park, S.; Kim, J.W. Multi-Agent Deep Reinforcement Learning-Based Resource Allocation in HPC/AI Converged Cluster. Comput. Mater. Contin. 2022, 72, 4375–4395.
Biemann, M.; Scheller, F.; Liu, X.; Huang, L. Experimental Evaluation of Model-Free Reinforcement Learning Algorithms for Continuous HVAC Control. Appl. Energy 2021, 298, 117164.
Van Le, D.; Liu, Y.; Wang, R.; Tan, R.; Wong, Y.-W.; Wen, Y. Control of air free-cooled data centers in tropics via deep reinforcement learning. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019; ACM: New York, NY, USA, 2019; pp. 306–315.
Zhang, C.; Kuppannagari, S.R.; Kannan, R.; Prasanna, V.K. Building HVAC scheduling using reinforcement learning via neural network based model approximation. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019; ACM: New York, NY, USA, 2019; pp. 287–296.
Pigott, A.; Crozier, C.; Baker, K.; Nagy, Z. GridLearn: Multiagent Reinforcement Learning for Grid-Aware Building Energy Management. Electr. Power Syst. Res. 2021, 213, 108521.
Deltetto, D.; Coraci, D.; Pinto, G.; Piscitelli, M.S.; Capozzoli, A. Exploring the Potentialities of Deep Reinforcement Learning for Incentive-Based Demand Response in a Cluster of Small Commercial Buildings. Energies 2021, 14, 2933.
Fu, Q.; Chen, X.; Ma, S.; Fang, N.; Xing, B.; Chen, J. Optimal Control Method of HVAC Based on Multi-Agent Deep Reinforcement Learning. Energy Build. 2022, 270, 112284.
Sun, Y.; Zhang, Y.; Guo, D.; Zhang, X.; Lai, Y.; Luo, D. Intelligent Distributed Temperature and Humidity Control Mechanism for Uniformity and Precision in the Indoor Environment. IEEE Internet Things J. 2022, 9, 19101–19115.
Zhao, H.; Wang, B.; Liu, H.; Sun, H.; Pan, Z.; Guo, Q. Exploiting the Flexibility Inside Park-Level Commercial Buildings Considering Heat Transfer Time Delay: A Memory-Augmented Deep Reinforcement Learning Approach. IEEE Trans. Sustain. Energy 2022, 13, 207–219.
Xu, D. Learning Efficient Dynamic Controller for HVAC System. Mob. Inf. Syst. 2022, 2022, 4157511.
Yu, L.; Sun, Y.; Xu, Z.; Shen, C.; Yue, D.; Jiang, T.; Guan, X. Multi-Agent Deep Reinforcement Learning for HVAC Control in Commercial Buildings. IEEE Trans. Smart Grid 2021, 12, 407–419.
Pinto, G.; Deltetto, D.; Capozzoli, A. Data-Driven District Energy Management with Surrogate Models and Deep Reinforcement Learning. Appl. Energy 2021, 304, 117642.
Zhang, X.; Biagioni, D.; Cai, M.; Graf, P.; Rahman, S. An Edge-Cloud Integrated Solution for Buildings Demand Response Using Reinforcement Learning. IEEE Trans. Smart Grid 2021, 12, 420–431.
Azuatalam, D.; Lee, W.L.; de Nijs, F.; Liebman, A. Reinforcement Learning for Whole-Building HVAC Control and Demand Response. Energy AI 2020, 2, 100020.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Automation & Control Systems

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Ayas Shaqour

Aya Hagishima

View Times: 645

Update Date: 28 Nov 2022

Table of Contents

Video Upload Options

Confirm