According to the International Energy Agency (IEA) report, the building and construction sector accounted for 35% of global energy consumption in 2020 [
1]. Among public buildings, heating, ventilation, and air conditioning (HVAC) systems are one of the largest energy consumers, accounting for about 60% of total consumption, which corresponds to roughly 12% of global final energy consumption [
2,
3]. Optimizing energy control strategies in office buildings, as one of the primary types of public buildings, and leveraging building envelope structures to enhance energy efficiency show substantial potential for energy savings and emission reductions [
4]. Existing studies have shown that deep reinforcement learning (DRL)-based HVAC control can improve energy efficiency compared to rule-based or model predictive methods [
5,
6,
7], yet most efforts remain limited to HVAC-only optimization or coordination with mechanically operable windows. These approaches do not effectively address buildings equipped with large fixed glazing systems—a common feature in high-rise office buildings—where the lack of operable windows prevents adaptive envelope interaction. Electrochromic windows (ECWs), with their ability to dynamically regulate solar heat gain without mechanical operation, provide a promising alternative, but their integration into multi-zone HVAC control has not been systematically explored. This unresolved gap forms the primary motivation of the present work.
Rule-based control (RBC) is widely adopted in HVAC systems due to its simplicity and ease of implementation. However, RBC strategies are typically static and rely heavily on the empirical knowledge of engineers and facility managers [
8]. As HVAC systems and building environments become increasingly complex, the RBC method struggles to adapt to dynamic conditions such as rapid weather variations, and changes in solar heat gain, which collectively lead to significant uncertainty in building thermal loads. Model predictive control (MPC) has demonstrated robust performance across a range of building control scenarios. However, MPC also presents certain limitations. Developing models typically requires a significant amount of time and effort, as the process of creating an accurate model of the building poses challenges for practical deployment [
9]. While both RBC and MPC offer distinct advantages, they exhibit limitations in control performance and generalizability across diverse building scenarios.
Deep reinforcement learning (DRL), a novel HVAC control method, possesses characteristics that enable it to address the limitations of RBC and MPC. DRL is capable of evaluating both the short-term and long-term consequences of control decisions, and can adapt to diverse environments and building configurations by learning from simulations or real-world interactions. Its ability to autonomously learn optimal control policies makes it one of the most promising methods for building energy management. Furthermore, DRL-based control methods learn directly from operational data and eliminate the need for complex building and energy system modeling, as required by MPC [
10]. Current DRL-based methods primarily focus on improving temperature control algorithms and have demonstrated notable improvements in energy efficiency [
5,
6,
7]. Several studies have extended this concept by incorporating the synergistic control of HVAC systems and building envelopes—such as windows—showing even greater potential for energy saving [
11,
12,
13].
However, most of these DRL-based methods adopt single-agent DRL algorithms, where the state space grows exponentially when multiple zones need to be controlled, leading to the dimensionality explosion problem. Moreover, they primarily focus on mechanically operated traditional windows, offering no effective solutions for buildings equipped with fixed windows.
Related Work
DRL-based HVAC control methods have recently garnered significant attention, with numerous studies aiming to enhance temperature regulation methods and minimize HVAC energy consumption while maintaining thermal comfort [
19,
20]. Azuatalam et al. [
5] developed a reinforcement learning agent that achieved up to 22% energy savings in single-zone HVAC control scenarios. Kodama et al. [
21] proposed a DRL-based method to concurrently enhance the functioning of HVAC and battery storage systems within a residence. However, as the number of controlled systems increases, the action space grows exponentially, hindering effective exploration of the state-action space and limits scalability. Bereketeab et al. [
22] employed the policy-based DRL method PPO-Clip, achieving a 12.6% reduction in heating coil power consumption and a 6.7% decrease in overall HVAC energy consumption. Hu et al. [
23] proposed a novel DRL method, GASAC, which increased the duration of acceptable indoor temperatures by 11.43% and decreased energy consumption by 14.05%. However, as the number of controlled systems increases, the action space of these methods grows exponentially, hindering effective exploration of the state–action space and limiting scalability—highlighting the inherent limitations of single-agent reinforcement learning in multi-system settings.
Compared to single-zone control, multi-zone HVAC control is more complex due to inter-zone thermal interactions. Considerable research has explored the use of DRL for multi-zone HVAC control. Deng et al. [
24] proposed a non-stationary DQN method that combines proactive environmental change detection with reinforcement learning to adapt to varying building conditions. Their approach outperformed standard DQN in both single-zone and multi-zone cases, reducing energy consumption by 13% and improving comfort by 9%. Wang et al. [
25] applied DQN for multi-zone HVAC optimization, demonstrating improvements in energy efficiency and comfort. But both of them emphasized that scalability remains a critical issue, since the action space grows exponentially with the number of zones. Blad et al. [
26] introduced an LSTM-enhanced Q-learning-based multi-agent framework for real-time HVAC optimization, reporting a 19.4% reduction in heating energy use compared to RBC. Zhang et al. [
27] developed a BEM-DRL framework using the A3C algorithm in combination with Bayesian optimization and genetic algorithms, which achieved a 16.7% reduction in heating demand relative to RBC. Li et al. [
28] explored demand response scheduling using Trust Region Policy Optimization (TRPO) for household appliances, extending DRL to demand-side management.
Other studies investigated continuous action control. Wang et al. [
29] applied both DQN and DDPG to multi-zone HVAC systems, showing that the DDPG-based method improved comfort while achieving a 10.06% energy saving compared with RBC. Gao et al. [
30] combined GRUs with DRL to capture time-series dynamics, leading to a 14.5% reduction in total energy use and an 88.4% improvement in comfort performance compared to standard DRL. Li et al. [
31] applied DDPG to a two-zone system and demonstrated superior performance over DQN, with a 15% gain in energy efficiency and a 79% reduction in comfort violations.
It is noteworthy that the aforementioned studies did not address the dimensionality explosion problem, whereby the global state and action spaces expand exponentially as the number of zones increases when a single agent simultaneously manages multiple HVAC systems. In contrast, MADRL methods alleviate this issue by distributing control across multiple agents. MADRL improves coordination among agents, enabling better overall control performance [
32]. For instance, Liang et al. [
33] adopted a MADRL approach incorporating an attention mechanism. Shen et al. [
34] proposed a multi-agent co-optimization framework that integrates D3QN and DDPG, effectively handling simultaneous control of both continuous and discrete actions by multiple agents. Results showed that the multi-agent cooperative optimization framework reduced thermal discomfort duration by 84.86%. Xue et al. [
8] introduced a hybrid GA-MADDPG approach for multi-zone HVAC control, achieving superior energy efficiency and thermal comfort compared to other DRL methods. Li et al. [
35] developed a multi-agent thermal control framework, modeling each zone as an independent agent. TRNSYS-based simulations validated the framework’s effectiveness in improving energy efficiency and thermal comfort. Liu et al. [
32] proposed a MADRL-based method for multi-zone HVAC control. The method autonomously adjusted temperature setpoints in each zone via agent collaboration, achieving a 51.09% and 4.34% reduction in power costs compared to RBC and single-agent DRL methods, while maintaining thermal comfort across zones.
The aforementioned methods have demonstrated promising outcomes. However, most MADRL-based HVAC control studies aimed at controlling the HVAC system, ignoring the interactions between the HVAC system and other building elements (e.g., windows and smart windows) in improving the building environment.
In an early study, Chen et al. [
12] proposed a Q-learning-based DRL method for the joint control of HVAC systems and windows, achieving a 23% reduction in HVAC energy consumption and an 80% decrease in discomfort hours, while also demonstrating effective humidity regulation. Ding et al. [
13] introduced the OCTOPUS system, which jointly controlled heating, cooling, and window operations, yielding significant energy savings while maintaining occupant comfort. Compared to RBC used in LEED Gold-certified buildings, the system improved energy efficiency by 14.26% and outperformed recent DRL methods by 8.1%, demonstrating the potential of multi-device collaborative control for building energy optimization. Xin et al. [
2] leveraging the ASHRAE Global Occupant Behavior Database to enable coordinated control of HVAC systems and windows. The approach demonstrated generalizability across four climatic regions, with a 24% increase in thermal comfort hours and a 24.7% reduction in HVAC energy consumption. Li et al. [
36] proposed a co-simulation framework integrating Building Energy Simulation (BES) and Computational Fluid Dynamics (CFD) with real-time control of HVAC systems and windows. The method achieved a 68.5% improvement in thermal comfort and a 43.5% reduction in daily cooling energy consumption compared to fixed-schedule operation, significantly enhancing real-time control accuracy and enabling co-optimization of comfort and energy efficiency.
ECWs are particularly attractive for low-energy construction applications among all advanced window glazings [
16,
17]. ECWs can regulate the radiant energy that enters a building by changing its light transmittance at low voltage [
37]. Sadooghi [
38] showed that the judicious management of switchable glass can improve energy efficiency in buildings. ECW applications will result in a reduction in the use of air conditioning systems. Reynisson [
39] compared the energy efficiency of ECWs against conventional windows, both with and without blinds, in many European cities. Their conclusions demonstrated that energy consumption could be reduced by 10–30% relative to windows with operable blinds and by 50–75% compared to windows without blinds, suggesting that ECW can significantly reduce energy consumption. Oh et al. [
40] examined the thermal load performance of ECW in Korea and determined that ECWs present the lowest heating and cooling loads when compared to alternatives such as blindsand low-emissivity double-glazed units, achieving a 31.4% reduction in total loads relative to the reference window. Most current research has focused on the independent energy-saving effects of ECW, while the synergistic control mechanisms between HVAC systems and ECW have not been sufficiently explored. Dussault et al. [
41] found that ECWs reduce the peak cooling load of office buildings, potentially creating new optimization opportunities for HVAC load response strategies. However, there are no studies on the thermal comfort and energy performance of the integrated control strategy for HVAC and ECW.
In summary, although existing research on MADRL-based HVAC control and ECW applications demonstrates promising potential, most studies still face two key limitations. First, single-agent reinforcement learning algorithms suffer from the dimensionality explosion problem in multi-zone control scenarios, which severely limits scalability. Second, the synergistic optimization between HVAC systems and fixed windows has been largely overlooked, especially in high-rise office buildings where large areas of non-operable glazing are prevalent. These gaps motivate our work, which proposes a MADRL-based control framework to jointly optimize HVAC and ECW operations.
The proposed framework integrates ECW with adjustable light transmittance into the HVAC optimization, aiming to enhance building energy optimization strategies based on findings and limitations from previous research. The framework is evaluated using modeling simulations conducted in the EnergyPlus environment [
42]. The proposed framework incorporates an optimization that effectively balances the agent’s individual and collective objectives, resulting in improved energy efficiency and thermal comfort in the HVAC system. The main contributions to this research are summarized as follows:
- 1.
We propose a multi-zone control framework that couples HVAC systems with ECWs, advancing the conventional HVAC–operable-window synergy to adaptive HVAC–ECW coordination. Designed for high-rise buildings with fixed glazing, the framework jointly optimizes HVAC operation and ECW transmittance for practical smart-building deployment. It addresses a key gap in previous work—the absence of coordinated HVAC–ECW control strategies for buildings dominated by fixed curtain walls and large fixed glazing.
- 2.
We adopt the Q-value Mixing (QMIX) algorithm and exploit its monotonic value-factorization to encode inter-zone dependencies, enabling cooperative multi-zone optimization while mitigating the exponential growth of the joint action space as the number of zones increases. Compared with other multi-agent methods (e.g., VDN, MADQN, MAPPO), QMIX better captures inter-zone thermal coupling, thereby achieving joint action combinations that are closer to optimal. It addresses a key gap: a scalable multi-zone controller that explicitly models thermal coupling while remaining tractable as the number of zones grows.
- 3.
The model was evaluated in terms of training efficiency, energy performance, and thermal comfort violation rate, and benchmarked against other multi-agent deep reinforcement learning methods. In addition, performance trade-offs and robustness were systematically analyzed under varying control priorities and climate conditions.