The heating, ventilation, and air conditioning (HVAC) system is a major energy consumer in office buildings, and its operation is critical for indoor thermal comfort.
1. Model Predictive Control
Model predictive control (MPC) has become popular over the past few years due to its potential for significant HVAC energy savings. MPC uses a building model to predict the future building performance, in which case the optimal control decisions for the current time step can be made. There have been a number of studies using MPC to control HVAC systems, such as controlling the supply air temperature setpoint for air handling units (AHUs)
[1], controlling the on/off status of the HVAC system
[2], controlling the ventilation airflow rates
[3], and controlling zone air temperature setpoints
[4], and most of them show significant energy savings.
While promising, MPC is still hard to implement in the real world because of the difficulties of HVAC modeling. The classic MPC requires low-dimensional and differentiable models; for example, the linear quadratic regulator needs a linear dynamics and quadratic cost function
[5]. This is difficult for HVAC systems, especially for the supervisory control of centralized HVAC systems, not only because it has nonlinear dynamics but also because it involves a number of control logics that make it non-continuous. For example, the control logic for a single-speed direct-expansion (DX) coil may be “turn on the DX-coil if there is indoor air temperature setpoint not-met in more than five rooms”. Such logic is hard to represent with a continuous mathematical model because of the if-then-else condition. Therefore, in most previous MPC studies, either the building had an independent air conditioner for each room rather than a centralized system (such as
[6][7][8]), or the MPC was used to directly control the local actuators rather than to set supervisory-level commands (such as
[9][10][11]). Neither way generalizes well for typical multizone office buildings, which usually have centralized HVAC systems and non-uniform HVAC design.
To address the modeling difficulties of MPC for HVAC systems, white-box building model (physical-based model) based predictive control was proposed in
[1][6][12]. This method may significantly reduce the modeling difficulties of MPC, because the white-box building model generalizes well for different buildings, and there are a number of software tools available for modeling. However, white-box building models, such as EnergyPlus models, are usually high-dimensional and non-differentiable. Heuristic search must be implemented for MPC. Given the fact that the white-box building model can be slow in computation, the scalability and feasibility of this type of MPC in the real world are questionable.
2. Model-Free Reinforcement Learning HVAC Control
Since model-based optimal control, such as MPC, is hard to use for HVAC systems, model-free reinforcement learning control becomes a possible alternative. To the authors’ knowledge, reinforcement learning control for HVAC systems has not yet been well studied. Either the reinforcement learning methods used are too simple to reveal their full potential, or the test buildings are too unrealistic. For example, Liu and Henze
[13] applied very simple discrete tabular-setting Q-learning to a small multizone test building facility to control its global thermostat setpoint and thermal energy storage discharge rate for cost savings. Regardless of the limited real-life experiment showing 8.3% cost savings compared with rule-based control, the authors admitted that the “curse of dimensionality” of such a simple reinforcement learning method limited its scalability. In the following research by the same authors
[14], a more advanced artificial neural network (ANN) was used to replace simple tabular-setting Q-learning; however, the results indicate that the use of ANN did not show clear advantages, probably due to the limited computation resources at that time.
The deep neural network (DNN) has become enormously popular lately in the machine learning community due to its strong representation capacity, automatic feature extraction, and automatic regularization
[15][16][17]. Deep reinforcement learning methods take advantage of DNN to facilitate end-to-end control, which aims to use raw sensory data without complex feature engineering to generate optimal control signals that can be directly used to control a system. For example, Mnih et al.
[18] proposed a deep Q-network that could directly take raw pixels from Atari game frames as inputs and play the game at a human level.
Deep reinforcement learning methods have been widely studied not only by machine learning and robotics communities but also by the HVAC control community.
Table 1 summarizes the HVAC control studies performed in recent years using deep reinforcement learning. Researchers have demonstrated via simulations and practical experiments that deep reinforcement learning can improve the energy efficiency for various types of HVAC systems. However, there are sparse data describing the implementation of end-to-end control for multizone buildings. On the one hand, the test buildings in several studies, including
[19][20][21][22][23], were single zones with independent air conditioners. On the other hand, conventional deep reinforcement learning methods cannot effectively solve multizone control problems. Yuan et al.
[24] showed that the direct application of deep Q-learning to a multizone control problem would make the training period too long. Ding et al.
[25] proposed a multi-branching reinforcement learning method to solve this problem, but the method required a fairly complicated deep neural network architecture and therefore could not be scaled up for large multizone buildings. Based on deep reinforcement learning, Zhang et al.
[26] proposed a control framework for a multizone office building with radiant heating systems. However, “reward engineering” (i.e., a complicated reward function of reinforcement learning) needed to be designed to help ensure that the reinforcement learning agent could learn efficiently, in which case end-to-end control could not be achieved.
Table 1. An overview of studies focusing on deep reinforcement learning methods for HVAC systems.
Reference |
Reinforcement Learning Method |
Test Building |
HVAC System |
[19] |
Linear state-value function approximation |
Two buildings, each having a single zone |
Independent air conditioners |
[20] |
Continuous-action Q-learning |
Multiple apartments, each having a single zone |
Independent air conditioners |
[21] |
Continuous-action Q-learning |
A hall with multiple single rooms |
Independent air conditioners |
[22] |
Model-assisted fitted Q-learning |
A lab room |
An independent HVAC unit |
[23] |
Model-assisted fitted Q-learning |
A single chamber |
A heat pump system |
[24] |
Deep Q-learning |
A multizone office building |
A variable air volume system |
[25] |
Multi-branching reinforcement learning |
A multizone office building |
A variable air volume system |
[26] |
Policy gradient |
A multizone office building |
Radiant heating systems |