Reinforcement Learning-Enabled Adaptive Control for Climate-Responsive Kinetic Building Facades

Li, Zhuorui; Tian, Jinzhao; Ji, Guanzhou; Cheng, Tiffany; Loftness, Vivian; Han, Xu

doi:10.3390/buildings15162977

Open AccessArticle

Reinforcement Learning-Enabled Adaptive Control for Climate-Responsive Kinetic Building Facades

by

Zhuorui Li

¹

,

Jinzhao Tian

²

,

Guanzhou Ji

²

,

Tiffany Cheng

³

,

Vivian Loftness

² and

Xu Han

^4,*

¹

College of Engineering, University of Notre Dame, Notre Dame, IN 46556, USA

²

School of Architecture, Carnegie Mellon University, Pittsburgh, PA 15213, USA

³

Department of Design Tech, Cornell University, Ithaca, NY 14853, USA

⁴

School of Architecture, University of Notre Dame, Notre Dame, IN 46556, USA

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(16), 2977; https://doi.org/10.3390/buildings15162977

Submission received: 23 July 2025 / Revised: 15 August 2025 / Accepted: 19 August 2025 / Published: 21 August 2025

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

Download

Browse Figures

Versions Notes

Abstract

As people spend most of their time indoors, the quality of the indoor lighting environment plays a crucial role in occupant health, mood, and productivity. While modern glazed curtain walls improve daylighting potential, they also heighten the risks of glare and associated solar heat gains that may result in occupant discomfort and overheating. To continuously ensure visual comfort while providing shading, kinetic responsive facades controlled by sensors and actuators can change the angles of the elements. Conventional control methods for shading devices mainly involve the unified control of each element. However, as each element of the kinetic responsive facade can be controlled independently, the number of potential control actions increases exponentially with the number of facade elements and possible angles. Traditional rule-based methods are challenging for handling this multi-objective high-dimensional control problem. This paper introduces a novel self-learning, real-time reinforcement learning (RL) controller that can interact with the environment to find a globally optimal control solution for each element in kinetic responsive facades, thereby meeting visual quality and shading goals. The configuration and workflow of the proposed RL controller are introduced and tested vertically, diagonally, and radially folding responsive facades. The results demonstrate that the proposed RL controller effectively maintains horizontal and vertical illuminance, with 72.92% of test points in occupied spaces falling within the defined comfort range. Additionally, it keeps the daylight glare probability (DGP) below 0.35, a level generally considered imperceptible.

Keywords:

kinetic responsive facades; reinforcement learning; visual comfort; real-time control; occupied space

1. Introduction

As people spend most of their time in buildings, indoor lighting conditions significantly affect human health, primarily through visual comfort [1]. Sunlight exposure improves the psychological and mental well-being, as well as the productivity of building occupants [2,3]. However, uncontrolled solar penetration can quickly produce disabling glare and significant solar heat gains, degrading visual comfort and increasing space-cooling energy use [2]. Building operations account for roughly 30–35% of global final energy consumption, and space conditioning (HVAC) commonly represents one of the largest end uses which around 35% or more of building energy in many contexts, with reported ranges extending higher depending on building type, climate, and system efficiency [4]. Consequently, managing indoor sunlight through coordinated control of work-plane illuminance, glare, shading position/optics, and related façade and lighting systems which is critical both to deliver occupant visual comfort and to moderate cooling loads during peak demand periods [4]. An effective control strategy seeks to maximize useful daylight while maintaining illuminance within target ranges and keeping daylight glare probability (DGP) below accepted discomfort thresholds thereby supporting occupant satisfaction and reducing energy use [5].

Established as a prevailing global design trend, glazed facades are widely used to convey transparency and embody the image of modern architecture [6]. Meanwhile, responsive facades have become popular, allowing for dynamic building designs while enhancing energy efficiency and indoor comfort in highly glazed buildings [2,7,8,9,10]. As high-performance facades, responsive facades can change geometries based on changing conditions to achieve a control objective [11,12,13]. Responsive facades are evolving and advancing due to technological enhancements and the application of innovative geometries [14].

Offering significant advancements over traditional shading devices, kinetic responsive facades feature three-dimensional, adjustable elements that provide a wide range of movements, including folding, rotating, translating, extracting, and contracting [15]. An example is the origami-inspired responsive façade of the Abu Dhabi Al-Bahar Towers, consisting of over 1049 individual units that dynamically change their fold angles to modulate daylight [2]. Unlike unified control systems, kinetic façade systems incorporate many independent moving components, each requiring individual control. Loonen et al. [16] defined the control complexity of a responsive facade using the term “degree of adaptivity,” implying that the potential for the indoor lighting environment can be improved with more possible action spaces. Despite their ability to adapt to environmental changes and occupancy preferences, the lack of automatic control strategies limits the full potential of responsive facades [17]. Consider a responsive facade with M components, each with N possible angle states, resulting in a total of N^M possible states [16]. The large number of possible states complicates determining the optimal control strategy, making conventional design strategies impractical.

Kinetic responsive facades with automated control algorithms are limited in the state of the art. Most existing studies do not focus on exploring the optimal control solution for each independent component, but rather on controlling all components uniformly [18,19,20,21]. Dev and Saifudeen [19] conducted simulation-based research to explore an optimal facade control strategy for dynamically balancing daylight utilization and reducing heat gain in tropical regions. The study used uniform slat angles for facade components on each orientation and evaluated the control performance on three representative days, considering two specific hours per day [19]. The facade control of the Al-Bahr Towers operates on a rule-based system, where the kinetic facade dynamically adjusts its configuration based on the angle at which the solar rays land on the curtain wall [22]. If the angle is between 0 and 79 degrees, it leads to an unfolded configuration; an angle between 80 and 83 degrees leads to a partly folded configuration, and an angle over 83 degrees leads to a fully folded configuration. However, each facade component has specific indoor spaces that it most directly impacts. Therefore, it is impossible to obtain optimal control with unified control.

To make the decentralized control strategy for kinetic facades possible, Shen and Han [23] proposed a simplified method by finding the best solution for each component and combining them to form the optimal facade control solution. This way, the simulation time can be reduced from N^M to N × M to find the optimal solution efficiently. However, in the simplified model, each component’s influence on the indoor lighting environment is treated independently, and the combined operational results are simplified as the sum of the results from individual components. This approach ignores the interactions between different elements, such as neighboring components blocking sunlight from their surroundings, resulting in a suboptimal control strategy. Takhmasib, Lee [24] conducted the first on-site study of a three-dimensionally movable kinetic façade controlled by artificial intelligence (AI) to enhance the real-time probability of achieving desirable indoor daylight levels. To develop the AI-based control model, they used Radiance to generate 20,000 simulation cases for model training. While this approach enabled decentralized control of façade elements, it required extensive time and labor for data preparation. These limitations highlight the necessity for a self-learning control framework for kinetic façades that can adapt to varying settings without the need for extensive data preparation.

Compared with kinetic facade control studies, traditional shading device control strategies have been extensively studied, typically falling into three main categories based on their performance criteria [25]. The first category includes threshold controllers, which activate or adjust the blinds to a predefined slat angle when a control variable, such as solar illuminance or irradiance, reaches a setpoint [25]. The second category involves sun-tracking controllers that dynamically adjust the slat angle to block direct solar radiation [25]. In the final category, mode and scene controllers utilize multiple sensors and control algorithms [25]. For instance, Koo et al. [26] proposed a new control method for automated Venetian blinds that maximizes occupant comfort by allowing users to define specific zones for glare protection, thus enhancing daylight penetration while accommodating occupant preferences. Based on occupied zone sensors, the proposed control algorithm calculates the control action of multiple components with multiple controlled zones. The first two methods generally apply a unified control approach, where all components share the same slat angles. In contrast, the third category can provide independent control with limited components. Olbina et al. [27] devised a closed-loop control strategy for a vertically split blind system, where different sections are optimized independently to enhance daylight distribution across various daylight zones. The control method by Koo et al. [26] adjusts the position of the lower end of multiple Venetian blinds and their slat angles based on hourly outdoor solar conditions, using an open-loop logic to control the blinds in sequence for the occupied area. The positions of the blinds are calculated based on the geometric relationship between the solar angle and desk locations in the office. However, when it comes to kinetic responsive facades with numerous components, applying such control logic to optimize each component individually becomes impractical due to the system’s complexity and the possible action spaces involved.

In office lighting environments, occupants are primarily concerned with the illuminance levels on the horizontal plane at desk height, commonly referred to as workspace planes [28]. To enable occupancy-centered facade control, placing illuminance sensors on workspaces provides real-time lighting data that can inform facade adjustments [29,30,31]. However, considering the complexities of multi-workspace sensor data and multi-element facades, existing control methods face challenges in establishing an effective feedback loop to determine the optimal linkage between sensor data and facade adjustments.

Given the limitations of current control methods for kinetic responsive facades, exploring advanced building control techniques is essential. Reinforcement learning (RL) is particularly suited to managing high-dimensional, complex, and uncertain environments. Unlike other studied control logics, which are often limited to specific simulated or experimental cases and tailored to particular control targets, RL learns the control policy by interacting with the environment with a reward mechanism. The algorithm will explore the environment with different control strategies to maximize cumulative rewards. This makes RL less restricted by the variations in the real environment, like the building orientation layout, occupancy patterns, orientation, window areas, and shading types, making the method more generalizable and suitable for different buildings and facade designs.

RL has been widely adopted in various building control areas, particularly in HVAC systems, battery management, and appliance control [32,33,34]. It was found that RL can achieve energy savings for HVAC or heating systems while keeping the indoor environment comfortable [35]. In addition, RL is also believed to be promising due to its capability to learn more complex policies in sophisticated environments [33]. Ding, Du [36] applied RL to explore the optimal multi-zone building action sequences, which proved its scalability in a high-state and action-dimensional environment. RL has also been used to achieve occupant-centered control through learning occupancy control behaviors and indoor lighting environment, balancing energy consumption and occupancy comfort [37].

Building on these strengths, applying RL to kinetic facade control is a logical and promising approach to maintaining a comfortable visual environment in response to dynamic indoor and outdoor conditions. By exploring and interacting with the lighting environment, the RL algorithm can find the optimal control strategy that achieves decentralized control and considers the interactions and connections between different components. This allows it to fully leverage the kinetic facade’s capabilities, optimizing its benefits in indoor lighting. To the best of the authors’ knowledge, prior studies have explored the application of reinforcement learning for controlling blind systems [38], as well as AI-based strategies for kinetic façade control aimed at optimizing the indoor lighting environment [24]. However, no published work has yet proposed an RL-based method specifically for controlling kinetic building facades.

Accordingly, the main contributions of this study are as follows:

Proposes a self-learning RL-based control approach for multi-objective, high-dimensional, responsive facade systems, representing the first application of RL to this problem.
Develops an integrated simulation framework that automates RL controller training and lighting simulation, enabling dynamic coupling between the simulation environment and the RL agent.
Validates the proposed RL-based kinetic responsive facade controller on three irregular shading configurations, benchmarking against a rule-based controller (RBC) to quantify feasibility and effectiveness.

2. Methodology

2.1. Reinforcement Learning

Reinforcement learning (RL) is a branch of machine learning concerned with training agents to achieve specific objectives by interacting with their environment and taking sequential actions to maximize a cumulative reward. Unlike supervised learning, where correct actions are explicitly provided through labeled datasets, RL depends on the agent autonomously discovering the optimal actions through experiential interaction, making it particularly valuable for complex, dynamic environments.

Deep reinforcement learning (DRL) integrates reinforcement learning with deep learning, leveraging the representational power of neural networks to approximate complex policies and value functions. This combination allows agents to operate effectively in environments characterized by high-dimensional state spaces, where traditional RL techniques would be computationally intractable.

Within DRL, there are two approaches: model-based and model-free reinforcement learning. Model-based RL involves creating an explicit model of the environment’s dynamics, which allows the agent to predict future states and plan actions accordingly. However, accurately modeling complex environments remains a significant challenge in model-based RL. In contrast, model-free RL directly learns optimal behaviors without requiring an explicit model of the environment’s dynamics, making it more widely used. Model-free DRL includes two main classes: deep Q-learning (DQL) and policy gradient methods. Deep Q-learning focuses on learning the value function to estimate the expected reward of specific actions, while policy gradient methods directly optimize the policy that determines the agent’s actions, often leading to more stable and efficient convergence.

This work employ a stochastic policy-gradient (Monte Carlo) method in the REINFORCE family [39], implemented with deep neural function approximation. REINFORCE is a foundational policy gradient method that optimizes cumulative reward by adjusting policy parameters. As outlined in Equation (1), the derivative of the expected return J is reformulated as a Monte Carlo process, enabling the estimation of the expectation using a large sample. This allows for an effective approximation of the gradient of the expected return based on sampled trajectories, making the training and the learning process manageable. The pseudocode for the REINFORCE algorithm is presented in Algorithm 1. The agent takes actions, receives rewards, and updates the policy to increase the likelihood of the actions that lead to higher rewards. To improve the algorithm’s stability and convergence speed, a baseline network is incorporated to estimate the value of the current state, thereby reducing the variance of gradient estimates and resulting in more stable and efficient learning [40].

\begin{matrix} (1a) & \nabla_{θ} J (π_{θ}) & = \nabla_{θ} E_{τ ~ π_{θ}} [R (τ)] \\ (1b) & = \nabla_{θ} \int_{τ} P (τ| θ) R (τ) d τ \\ (1c) & = \int_{τ} \nabla_{θ} P (τ| θ) R (τ) d τ \\ (1d) & = \int_{τ} P (τ| θ) \nabla_{θ} \log P (τ| θ) R (τ) d τ \\ (1e) & = E_{τ ~ π_{θ}} [\nabla_{θ} \log P (τ| θ) R (τ)] \\ (1f) & = E_{τ ~ π_{θ}} [\sum_{t = 0}^{T} \nabla_{θ} \log π_{θ} (a_{t}| s_{t}) R (τ)] \\ (1g) & \approx \frac{1}{N} \sum_{i = 1}^{N} \sum_{t = 0}^{T} \nabla_{θ} \log π_{θ} (a_{t}^{i}| s_{t}^{i}) R (τ^{i}), \end{matrix}

where

J (π_{θ})

denoted the expected return of policy

π_{θ}

,

τ

is the trajectory of sequence of

s_{t}

and action

a_{t}

. The R(

τ)

denoted the total return for trajectory

τ

. The

P (τ| θ)

denotes the probability of trajectory

τ

under policy

π_{θ}

. The

π_{θ} (a_{t}| s_{t})

denotes the policy’s probability of taking action

a_{t}

in state

s_{t}

under parameter

θ

. The

\nabla_{θ}

denotes the gradient with respect to policy parameters

θ

. The N denotes the number of sampled trajectories in the Monte Carlo estimation. The i denotes the index of each sampled trajectory.

Algorithm 1. REINFORCE with Baseline

Input: A differentiable policy parameterization π

(α| s, θ)

Input: A differentiable state-value function parameterization

\hat{v}

(s, w)

Algorithm parameters: step sizes

α^{θ} > 0, α^{ω} > 0

, epoch N

Initialize policy parameter

θ \in R^{d^{'}}

and state-value weights

w \in R^{d^{'}}

(e.g., randomly)

for n = 1 to N do

Generate an episode

S_{0}, A_{0}, R_{1}, \dots, S_{T - 1}, A_{T - 1}, R_{T},

following

π (\cdot| \cdot, θ)

for t = 0 to T − 1 do

G \leftarrow \sum_{k = t + 1}^{T} γ^{k - t - 1} R_{k}

δ \leftarrow G - \hat{v} (S_{t}, w)

w \leftarrow w + α^{w} δ \nabla \hat{v} (S_{t}, w)

θ \leftarrow θ + α^{θ} γ^{t} δ \nabla \ln π (A_{t}| S_{t}, θ)

end for

2.2. RL-Based Control for Kinetic Responsive Facades

The agent interacts with the environment by adjusting the facade components’ angle and updates the control policy using a policy gradient algorithm. Human visual comfort is assessed by two kinds of illuminance sensors, which measure the vertical and horizontal illuminance values at each office desk. Rewards are assigned based on the illuminance values measured by the sensors. During the training process, the agent explores various control strategies, earning rewards based on the visual condition achieved in the workspace. The policy gradient algorithm is used to maximize the cumulative reward over the entire control period. Figure 1 shows the framework of RL-based optimal kinetic responsive facade control.

This paper selects and tests three types of existing kinetic-responsive facades to evaluate the robustness and generalization of the proposed control method. Each facade type features distinct kinetic element motion based on its geometry and structural design. Real-time ClimateStudio [41] simulations were used for the virtual environment to generate lighting data and support the real-world training and testing process. The proposed RL control method offers a computational method for users to find the optimal shading or facade control strategy for various distinct environment settings without extensive control logic exploration, complex physical modeling, or frequent manual control.

2.3. Simulation in a Virtual Environment

A virtual testbed was developed to evaluate the proposed RL-based kinetic facade control method under varying daylight conditions. The office was modeled as a shoebox geometry defined by non-uniform rational B-spline (NURBS) surfaces in Grasshopper [42] and placed facing the south without surrounding obstructions. Surface optical properties were assigned according to standard LM-83 material definitions, as summarized in Table 1.

Three kinetic responsive facade cases are modeled based on real-world buildings: LA MAISON Hotel, Campus Kolding, and Al Bahr Towers. Table 2 presents the real-world building pictures and corresponding facade models for simulation. The geometries were parameterized in Grasshopper to allow rotation of facade elements according to their movement type (vertically folding, diagonally folding, and radially folding). Component dimensions, movement characteristics, and example rotation angles are given in Table 3. The possible rotation range for all elements was 0–90°, incremented in 5° steps.

Lighting simulations were performed in ClimateStudio [41], which integrates the Radiance ray-tracing engine [47] for physically based daylight calculations. The simulation time step was 1 h, and weather data were sourced from the Typical Meteorological Year (TMY3) file for Boston. The lighting simulation selected the summer solstice (21 June) and the September equinox (21 September) as target dates. These two days represent distinct sun positions in the year as shown in Figure 2, effectively illustrating the variability in solar parameters [48]. On the summer solstice, the solar altitude in Boston (42.36° N, 71.06° W) reaches approximately 71° at solar noon with a wide azimuth range, resulting in the longest period of daylight and shortest night of the year. On the September equinox, the solar altitude at noon is around 47°, and the sun’s path is more symmetric between morning and afternoon, producing a more balanced daylight distribution.

The RL training and evaluation were performed using a custom integrated framework. This framework links the ClimateStudio (1.9) simulation environment and the RL algorithm (Python 3.10.8, PyTorch 1.13.1) via a WebSocket-based data exchange platform as shown in Figure 3. At each simulation step, the simulation environment sends the current state variables to the RL agent, written in python scripts, including the horizontal and vertical illuminance at sensor planes. Then, the agent computes control actions (rotation angles for each kinetic element), which are transmitted back to ClimateStudio for the next simulation step. The framework records states, actions, rewards, and timestamps in local log files for analysis.

The RL controller comprises two neural networks: a policy network for action selection and a baseline network for value estimation, enabling variance reduction in policy gradient updates. The reward function penalizes deviations from the target illuminance range and accounts for control smoothness.

2.4. State-Action Space and Reward Calculation

Defining the state-action space for the policy gradient facilitates the proposed RL model to learn a robust control strategy. Actions(a) are rotation angles of each facade element. Although each facade has a different motion, the possible action spaces are the same. The rotation angle ranges from 0 to 90, with an increment of 5 degrees, and possible angles for each element are denoted

a_{t}^{n}

= {0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90}. The actions at each time t consist of each element’s angles, Actions = {

a_{t}^{1}

,

a_{t}^{2}

,

a_{t}^{3}

, ……,

a_{t}^{n}

}, n = number of elements.

The proposed RL control method was tested on a digital Rhino model of an office in Boston, USA, with a South-facing facade. The room dimensions are 9.8 m (width), 11.0 m (length), and 3.0 m (height). Two challenging workspaces are selected, as shown in Figure 3: the “Near-Facade” on the Southside, which easily has glare issues, and the “Far-Facade” on the Northside, which cannot easily access enough daylight. Each workspace has dimensions of 1.8 m (length), 1.2 m (width), and 0.7 m (height). Their positions in the room are shown in Figure 4. The state (S) comprises the average illuminance level measured by sensors placed on the horizontal and vertical planes of each workspace, oriented toward the northeast, northwest, southeast, and southwest. Each plane is instrumented with a sensor grid at 0.3 m (1 ft) intervals. The horizontal test plane measures 1.8 m in length and 0.6 m in width, while the vertical test plane measures 0.65 m in width and 0.4 m in height. This configuration enables high-resolution measurement of spatial variations in illuminance, providing a detailed representation of the lighting conditions within the workspace. If the illumination requirements can be satisfied with these two workspaces, the office room’s middle area can also be in a comfortable lighting environment. The State includes eight continuous state variables:

State = {E_{H}^{N E}, E_{H}^{N W}, E_{H,}^{S E} E_{H,}^{S W} E_{V}^{N E}, E_{V}^{N W}, E_{V,}^{S E} E_{V}^{S W}} .

where

E_{H}

and

E_{V}

denote horizontal and vertical illuminance levels, respectively. This setup allows the RL agent to account for spatial lighting variations across the office, ensuring task-appropriate lighting and a balanced luminous environment throughout the room.

The reward is designed to help the agent achieve a visually comfortable environment. Various glare metrics have been developed to assess visual comfort. To ensure sufficient indoor light level while minimizing glare, the indoor illuminance values on the work surface and DGP from occupants’ viewpoints are often combined in indoor lighting analysis. The Lighting Handbook provides recommended values for office buildings in both the vertical and horizontal directions [49]. While different illuminance targets have been used in previous studies, this study uses the recommended targets for computer screens and electronic ink devices: 300 lux for illuminance in the horizontal direction and 150 lux for the vertical direction [49]. The comfort range of horizontal illuminance is typically defined as between 300 lux and 1200 lux [50]. In laboratory settings, illuminance levels are recommended to be between 750 lux and 1200 lux to ensure adequate visibility for detailed tasks [51]. However, specific upper limits for indoor vertical illuminance are not universally established, and this study defines 1200 lux as the upper bound. The proposed control method can learn from different control tasks and targets that can select different comfort ranges.

In contrast to other widely used glare measures, Wienold and Christoffersen [52] defined DGP to describe the probability of people feeling the potential glare problems caused by natural sunlight. For glare control, vertical illuminance could be used to calculate the DGP since it strongly correlates with the occupant’s visual comfort [53]. Considering that DGP requires multiple variables in computation to simplify the glare prediction, the Simplified DGPs can be calculated as in Equation (2):

DGPs = 6.22 × 10 − 5E_v + 0.184,

(2)

where E_v is the vertical illuminance value, the suggested DGP ranges are labeled as invisible glare for ranges below 0.35, noticeable glare between 0.35 and 0.40, disturbing glare between 0.40 and 0.45, and intolerable glare for ranges above 0.45 [53].

Based on the comfort range, the positive reward for the horizontal illuminance level is assigned from 300 lux to 600 lux, while the vertical illuminance level is defined as 150 lux to 600 lux. The reward for “Near-Facade” workspaces is higher than for “Far-Facade” workspaces due to the increased risk of glare near the facade. Additionally, if the illuminance level exceeds 8000 lux at any test plane, the reward is set to 0. This penalization accelerates learning by discouraging the exploration of undesirable state-action pairs during early-stage training. Each workspace has two horizontal test planes and two vertical test planes. The total reward for each time step is the sum of all test planes. The reward calculation pseudocode is shown in Algorithm 2. The hyperparameters of the RL model are presented in Table 4.

Algorithm 2. Pseudocode for reward calculation

For “Near-Facade” Workspaces:
Horizontal illuminance level ∊ [300, 600] ⟶ reward += 4
Vertical illuminance level ∊ [150, 500] ⟶ reward += 4
For “Far-Facade” Workspaces:
Horizontal illuminance level ∊ [300, 600] ⟶ reward += 1
Vertical illuminance level ∊ [150, 500] ⟶ reward += 1
Also, If ∃ illuminance level ≥ 8000 lux ⟶all reward = 0

3. Results

To evaluate the robust control capability of the proposed RL-based façade controller, RBC is used as a baseline for comparison. For this comparison, three key environmental metrics are employed: hourly horizontal illuminance, vertical illuminance, and DGP at the work planes. These metrics collectively assess the controller’s ability to maintain a visually comfortable indoor environment for occupants. In addition, to evaluate the learning process of the RL agent, we track three training-related metrics: cumulative reward, policy loss, and baseline loss. Cumulative reward reflects the agent’s overall performance over time, policy loss measures the improvement of the policy through gradient updates, and baseline loss evaluates the accuracy of value estimation for variance reduction. Together, these metrics provide insight into both the control performance and the learning stability of the proposed RL approach.

In the RBC, sunlight is typically insufficient at the end of the working hours, leaving the facade open. As a result, in the RBC strategy, each day starts with the facade opening at 45 degrees. To ensure a fair comparison between the RL and RBC, both rely on the current lighting conditions in the workspace to determine the slat angle adjustments for the next time step. In the RBC approach, all facade elements are uniformly adjusted based on the average illuminance level of the test planes. The target comfort range is between 300 and 600 lux, and the available slat angles are identical for both control strategies. If the average illuminance of the test planes falls below the lower limit of the comfort range, the slat angle increases by 25 degrees in the next step. Conversely, the slat angle decreases by 25 degrees if it is above the upper limit. The control logic is summarized in Figure 5.

3.1. Reward

Figure 5 and Figure 6 show that the cumulative rewards curve reveals significant variations across the different facades. Vertically and Diagonally Folding Facades exhibit a clear upward trend in cumulative rewards, indicating an effective optimal control policy learning process. The cumulative reward for the Radial Folding Façade oscillates around a high level throughout the training process. This consistent performance indicates that the initial parameters provided an optimal control policy.

3.2. Policy Loss and Baseline Loss

The Baseline Loss is the Mean Square Error (MSE) between the actual cumulative reward and the estimated value from the Baseline network. This encourages the Baseline network to predict the cumulative reward for a given state accurately. The Policy Loss is calculated as the difference between the real cumulative reward and the Baseline network’s estimated values weighted by the log probability of each action given by the states. It drives the Policy network to select policies that yield higher cumulative rewards.

Figure 6 and Figure 7 show that the Policy and Baseline losses converge after 100 episodes. This indicates that the model has sufficiently explored the environment and learned the relationship between the shading facade elements’ angles and the resulting illuminance levels in the workspace. The Baseline network can reliably estimate the value for the given state (the illuminance values for the horizontal and vertical space), and the Policy network learns a policy that can keep the illuminance within the comfortable range.

3.3. Horizontal and Vertical Illuminance Results

The vertically folding facade is modeled after La Maison’s facade [44], which comprises foldable panels gliding continuously along a track system. The shutters function as efficient shading elements, optimizing natural light transmission from the exterior. They dynamically adjust to varying sunlight angles throughout the day, enhancing the building’s exterior aesthetics.

Most of the “Near-facade” test planes within the proposed RL control maintain illuminance levels within the comfort range (highlighted in gray) throughout the summer solstice (Figure 8 and Figure 9). However, illuminance in the “Far-facade” workspaces falls below 300 lux. This is due to the RL model prioritizing glare reduction in the “Near-facade” workspaces, which led to insufficient daylight reaching the ‘Far-facade’ workspaces. Therefore, the RL controller applied to the vertically folding facade performs effectively during the summer solstice, demonstrating its ability to achieve the control targets at different times.

The proposed RL controller performs better in the fall equinox compared to the summer solstice, with a higher percentage of “Far-facade” workspaces maintaining illuminance levels within the visual comfort range. An obvious difference between the indoor illuminance levels at both vertical and horizontal planes is that there is more variation during the fall equinox. The “Far-facade” workspaces (Northeast and Northwest) exhibit a more stable lighting level than the “Near-facade” workspaces (Southeast and Southwest) throughout the day. The horizontal illuminance values of the “Near-facade” work planes can reach nearly 1000 lux at noon. Therefore, during the afternoon, the test planes’ illuminance levels dropped as the facade system increased its closure to mitigate glare and excessive daylight. The indoor lighting environment changed through the fall equinox and summer solstice, corresponding with seasonal variations in the sun’s path.

In contrast to RL-based control, RBC systems face more significant challenges in balancing daylight harvesting with glare mitigation. This is particularly evident on 21 September, when horizontal test planes reveal oscillating lighting levels throughout the day, resulting in an unstable lighting environment for occupants. The RBC struggles to maintain a consistent lighting environment across all occupied workspaces. While the ‘Far-facade’ workspaces remain within the comfort range, the ‘Near-facade’ areas are consistently over-lit.

The diagonally folding facade is based on the Kolding Campus building [45]. Its solar shading system comprises approximately 1600 pieces of perforated steel triangular shutters. These shutters are strategically installed on the facade, allowing them to dynamically adjust to sunlight conditions and regulate the amount of light entering the building. For the diagonally folding façade under RL control, the indoor lighting environment shows a more stable pattern compared to vertical and radial folding facades. The horizontal illuminance remains around 300 lux without significant fluctuations across both test days. The southwest workspace is considerably higher than the other three workspaces for vertical illuminance due to its closeness to the facade. Meanwhile, the other three workspaces remain consistent, similar to the stability seen in the horizontal illuminance, without significant variation over the two days. This stability may be attributed to the diagonally folding facade having more individual control components, which enhance its adaptability to changing conditions. This observation also suggests that, in future loss function design, it is essential to consider not only target lux levels for illuminance but also the stability of these values over time. The RBC on the diagonally folding facade performs well on 21 September, with almost all workspaces maintaining illuminance levels within the comfort range. However, on 21 June, horizontal illuminance levels in the ‘Far-facade’ workspaces lower the comfort range, leading to an insufficient lighting environment.

The Abu Dhabi Towers inspire the radially folding facade [46], where the fundamental element is triangular. The control performance closely mirrors that of the vertically folding façade, with both facades under RL control successfully maintaining comfortable illuminance levels in the areas near the façade. However, some workspaces farther from the facade receive insufficient sunlight, particularly regarding horizontal illuminance during the summer solstice. The RBC on the radially folding facade performs similarly to the vertically folding facade, with illuminance levels in the ‘Fear-facade’ workspaces significantly lower than the comfort range. This may suggest that despite the geometric differences between vertical and radial folding facades, the similarity in control performance could be attributed to their similar rotational motion, with all components opening from a central point or axis.

Figure 10 presents the unmet illuminance percentages for different shading systems across two kinds of test planes for horizontal and vertical illuminance, measured on 21 June and 21 September. For all three test façades, the RL approach outperforms RBC. RL reduces the unmet percentage by approximately 10% on the vertically folding and radial folding facades. For the diagonally folding façade, where RL and RBC achieve the lowest unmet percentage, RL still provides an additional 3% reduction. Under RL control, the unmet illuminance primarily occurs in “Far-facade” workspaces, where the illuminance falls below the comfortable range. This trend aligns with the RL controller’s reward design, which prioritizes maintaining “Near-facade” workspaces within the comfort range to mitigate glare issues. While artificial lighting can mitigate insufficient daylight, managing glare remains more challenging once it occurs. The RBC performs worse than the RL-based control across both days, indicating its lower adaptability to dynamic sunlight conditions. For both control strategies, horizontal illuminance exhibits a higher unmet percentage. Future studies can focus on improving horizontal illuminance control to enhance overall performance. In conclusion, the proposed RL-based controller effectively balances daylight utilization and glare reduction across various facade designs. On average, 72.92% of the test planes maintain illuminance levels within the comfort range, while only 27.08% fall short of the target, highlighting the controller’s adaptability and effectiveness.

As shown in Figure 11, all tested facade configurations significantly reduce DGP values compared to the scenario without a facade, consistently keeping them below 0.35. This reduction demonstrates a substantial improvement in glare control, enhancing visual comfort for occupants. The façades effectively mitigate excessive daylight penetration and glare, contributing to a more balanced indoor luminance distribution. Additionally, variations in DGP across different facade types suggest that some configurations are more effective at reducing glare than others, highlighting the importance of optimizing façade designs for daylighting performance and occupant comfort. The RL and RBC effectively maintain workspace DGP below 0.35. The RBC adjusts slat angles based on average illuminance, ensuring that the illuminance level on each test plane remains below a high threshold, thereby mitigating glare issues. The RL controller dynamically adjusts slat angles according to the illuminance levels of each workspace’s test planes, optimizing each façade component. In contrast, the RBC applies uniform slat angle adjustment based on the average illuminance across all test planes with a fixed adjustment, limiting its responsiveness to changing daylight conditions. As a result, its glare management is less effective, particularly in workspaces with uneven daylight distribution.

Overall, the proposed RL-based control method successfully creates a comfortable visual environment for the occupied space, as evaluated on the test planes. Across all three tested façades, glare is effectively prevented throughout the evaluation period. While the control method strives to balance glare prevention while providing sufficient daylight to occupied workspaces, glare prevention is prioritized. Glare can only be mitigated through shading devices, whereas artificial lighting can compensate for insufficient daylight. The control results reflect this approach, with glare completely avoided, though the far-facade workspaces occasionally receive inadequate daylight. This lack of daylight may also be related to the deeper-than-wide layout of the test room, meaning the control method is not the sole factor contributing to the insufficient daylight. This issue could be further mitigated with an artificial lighting system to supplement natural light. Despite limitations, the proposed RL control method demonstrates robust performance across the different facade designs, maintaining an effective balance between glare mitigation and daylight availability.

4. Discussion

Although this paper proposes an RL-based controller for a kinetic façade to provide a comfortable lighting environment for workspaces, there are several limitations regarding the real-world application of RL. Firstly, the study relies on simulation as a substitute for real building environments. However, implementing such a learning system in actual buildings introduces hardware challenges. These include determining appropriate sensor locations, integrating controllers capable of running RL algorithms, and installing actuators to adjust the slat angles of each façade component. These hardware-related issues are neither discussed in this paper nor adequately addressed in current RL-based controller studies. Additionally, the safety and reliability of RL control outcomes remain a concern. Unlike RBC or MPC, which can impose hard constraints on control actions, RL-based control may occasionally generate unreasonable or unsafe actions.

This paper compares the proposed RL-based controller to the RBC. While the improvement may appear modest over the two test days, it is important to note that the benchmark RBC was derived through exhaustive simulations, evaluating all possible slat angles to identify the best performance for those specific conditions. The RBC is fine-tuned based on the simulated results. However, such a control policy is unlikely to generalize well across buildings with different orientations, facade geometries, rotation axes, or spatial dimensions. Moreover, deriving these RBC requires significant computational effort. In contrast, the RL controller learns optimal policies directly from sensor data, making it a more scalable and adaptable solution for diverse real-world applications.

Future research on RL-based control of kinetic responsive facades can focus on evaluating the generalization and stability of the optimal control strategies for different scenarios and degrees of adaptivity. First, all test models were conducted in Boston using the same weather file. Future studies will examine how the geometries and motion characteristics of different shade panel designs perform in diverse climates and weather conditions to broaden the understanding of RL-based performance in other geographic locations. Second, the RL model showed instability, as training the model with the same number of epochs did not consistently yield optimal results. To better evaluate and enhance training stability, future work should involve running multiple training sessions and analyzing the variability in performance. This can help identify patterns and develop more robust training guidelines, ensuring more reliable and repeatable outcomes. Achieving this will also require testing more advanced RL model architectures and reward function designs in the future to improve overall model performance and consistency. Additionally, the training time needed for the RL model makes real-world deployment challenging. Future work will focus on reducing the training time by deploying the model with a pre-trained network and further fine-tuning it in real-world exploration. Moreover, upcoming studies could expand the RL control scope beyond indoor lighting to include indoor thermal comfort. Since solar radiation contributes substantially to internal heat gains, a coordinated RL-based control strategy that manages lighting and temperature would be highly beneficial. RL controllers’ hardware and practical implementation in real buildings are also vital for future research. Key considerations included sensor placement, computational integration of RL algorithms into control systems, and actuator responsiveness for façade components, which are crucial but remain underexplored in the current literature. Addressing these challenges can further enhance the effectiveness and adaptability of RL-controlled shading devices across various environmental contexts, shading components, and building geometries.

5. Conclusions

In conclusion, the proposed RL model demonstrates a reliable capacity to dynamically adjust the rotation angle of shading elements in real-time response to indoor illuminance levels at both vertical and horizontal test planes. This adaptability is validated across three distinct kinetic responsive facade systems, each designed with unique and irregularly shaped shading elements. Across three case studies, the proposed RL-based controller consistently outperforms the RBC approach. Specifically, RL reduces the unmet percentage by approximately 10% for vertically folding and radial folding façades, and by 3% for the diagonally folding façade. On average, 72.92% of the test planes maintain illuminance levels within the comfort range, with a 5.73% reduction in unmet hours compared with the optimal RBC. For glare control, RL effectively maintains workspace DGP values below 0.35, which is critical for ensuring visual comfort in office environments. Furthermore, the RL model effectively manages the lighting environment in diverse indoor workspaces. By comparing various shading patterns and geometries, the RL model proves to be a versatile tool for harnessing the available action spaces of kinetic responsive facades to enhance indoor daylight quality, offer insightful design recommendations, and shape effective shading control strategies. Future studies will focus on safe learning mechanisms, optimal training strategies, hardware implementation for real-world applications, and the integration of thermal comfort considerations into shading control.

Author Contributions

Conceptualization, Z.L., G.J. and J.T.; methodology, Z.L., G.J., J.T. and X.H.; software, Z.L., G.J. and J.T.; validation, Z.L., G.J. and J.T.; formal analysis, Z.L., G.J. and J.T.; investigation, Z.L., G.J. and J.T.; resources, Z.L., G.J., J.T. and X.H.; data curation, Z.L., G.J. and J.T.; writing—original draft preparation, Z.L., G.J. and J.T.; writing—review and editing, Z.L., G.J., J.T., X.H., V.L. and T.C.; visualization, Z.L., G.J. and J.T.; supervision, X.H.; project administration, X.H.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Konstantzos, I.; Sadeghi, S.A.; Kim, M.; Xiong, J.; Tzempelikos, A. The effect of lighting environment on task performance in buildings–A review. Energy Build. 2020, 226, 110394. [Google Scholar] [CrossRef]
Hosseini, M.; Mohammadi, M.; Guerra-Santin, O. Interactive kinetic façade: Improving visual comfort based on dynamic daylight and occupant’s positions by 2D and 3D shape changes. Build. Environ. 2019, 165, 106396. [Google Scholar] [CrossRef]
Shishegar, N.; Boubekri, M. Natural light and productivity: Analyzing the impacts of daylighting on students’ and workers’ health and alertness. In Proceedings of the International Conference on “Health, Biological and Life Science”(HBLS-16), Istanbul, Turkey, 18–19 April 2016. [Google Scholar]
Doe, U. An assessment of energy technologies and research opportunities. In Quadrennial Technology Review; United States Department of Energy: Washington, DC, USA, 2015; pp. 12–19. [Google Scholar]
Bunning, M.E.; Crawford, R.H. Directionally selective shading control in maritime sub-tropical and temperate climates: Life cycle energy implications for office buildings. Build. Environ. 2016, 104, 275–285. [Google Scholar] [CrossRef]
Hwang, R.-L.; Chen, W.-A. Creating glazed facades performance map based on energy and thermal comfort perspective for office building design strategies in Asian hot-humid climate zone. Appl. Energy 2022, 311, 118689. [Google Scholar] [CrossRef]
Attia, S.; Lioure, R.; Declaude, Q. Future trends and main concepts of adaptive facade systems. Energy Sci. Eng. 2020, 8, 3255–3272. [Google Scholar] [CrossRef]
Al-Masrani, S.M.; Al-Obaidi, K.M. Dynamic shading systems: A review of design parameters, platforms and evaluation strategies. Autom. Constr. 2019, 102, 195–216. [Google Scholar] [CrossRef]
Körner, A.; Born, L.; Mader, A.; Sachse, R.; Saffarian, S.; Westermeier, A.; Poppinga, S.; Bischoff, M.; Gresser, G.; Milwich, M. Flectofold—A biomimetic compliant shading device for complex free form facades. Smart Mater. Struct. 2017, 27, 017001. [Google Scholar] [CrossRef]
Barozzi, M.; Lienhard, J.; Zanelli, A.; Monticelli, C. The sustainability of adaptive envelopes: Developments of kinetic architecture. Procedia Eng. 2016, 155, 275–284. [Google Scholar] [CrossRef]
Isaia, F.; Fiorentini, M.; Serra, V.; Capozzoli, A. Enhancing energy efficiency and comfort in buildings through model predictive control for dynamic façades with electrochromic glazing. J. Build. Eng. 2021, 43, 102535. [Google Scholar] [CrossRef]
Loonen, R.C.; Favoino, F.; Hensen, J.L.; Overend, M. Review of current status, requirements and opportunities for building performance simulation of adaptive facades. J. Build. Perform. Simul. 2017, 10, 205–223. [Google Scholar] [CrossRef]
Attia, S.; Bilir, S.; Safy, T.; Struck, C.; Loonen, R.; Goia, F. Current trends and future challenges in the performance assessment of adaptive façade systems. Energy Build. 2018, 179, 165–182. [Google Scholar] [CrossRef]
Pesenti, M.; Masera, G.; Fiorito, F.; Sauchelli, M. Kinetic solar skin: A responsive folding technique. Energy Procedia 2015, 70, 661–672. [Google Scholar] [CrossRef]
Hosseini, S.M.; Mohammadi, M.; Rosemann, A.; Schröder, T.; Lichtenberg, J. A morphological approach for kinetic façade design process to improve visual and thermal comfort. Build. Environ. 2019, 153, 186–204. [Google Scholar] [CrossRef]
Loonen, R.C.; Rico-Martinez, J.; Favoino, F.; Brzezicki, M.; Ménézo, C.; La Ferla, G.; Aelenei, L.L. Design for façade adaptability: Towards a unified and systematic characterization. In Proceedings of the 10th Conference on Advanced Building Skins, Bern, Switserland, 3–4 November 2015. Economic Forum. [Google Scholar]
Zhang, X.; Zhang, H.; Wang, Y.; Shi, X. Adaptive façades: Review of designs, performance evaluation, and control systems. Buildings 2022, 12, 2112. [Google Scholar] [CrossRef]
Sabry, S.M.; El-Ela, M.M.A.; Farag, M.A. Development of form proportions configurations in office building skins in order to improve daylight levels using “Parametric Design Methods”. J. Am. Sci. 2015, 11, 212–219. [Google Scholar]
Dev, G.; Saifudeen, A. Dynamic facade control systems for optimal daylighting, a case of Kerala. Sustain. Anal. Model. 2023, 3, 100018. [Google Scholar] [CrossRef]
Wagdy, A.; Elghazi, Y.; Abdalwahab, S.; Hassan, A. The balance between daylighting and thermal performance based on exploiting the kaleidocycle typology in hot arid climate of Aswan, Egypt. In Proceedings of the AEI Conference 2015: Birth and Life of the Integrated Building, Milwaukee, WI, USA, 24–27 March 2015; pp. 300–315. [Google Scholar]
Kim, H.; Asl, M.R.; Yan, W. Parametric BIM-based energy simulation for buildings with complex kinetic façades. In Proceedings of the 33rd eCAADe Conference, Vienna, Austria, 16–18 September 2015. [Google Scholar]
Karanouh, A.; Kerber, E. Innovations in dynamic architecture. J. Facade Des. Eng. 2015, 3, 185–221. [Google Scholar] [CrossRef]
Shen, L.; Han, Y. Optimizing the modular adaptive façade control strategy in open office space using integer programming and surrogate modelling. Energy Build. 2022, 254, 111546. [Google Scholar] [CrossRef]
Takhmasib, M.; Lee, H.J.; Yi, H. Machine-learned kinetic Façade: Construction and artificial intelligence enabled predictive control for visual comfort. Autom. Constr. 2023, 156, 105093. [Google Scholar] [CrossRef]
Konstantoglou, M.; Tsangrassoulis, A. Dynamic operation of daylighting and shading systems: A literature review. Renew. Sustain. Energy Rev. 2016, 60, 268–283. [Google Scholar] [CrossRef]
Koo, S.Y.; Yeo, M.S.; Kim, K.W. Automated blind control to maximize the benefits of daylight in buildings. Build. Environ. 2010, 45, 1508–1520. [Google Scholar] [CrossRef]
Olbina, S.; Hu, J. Daylighting and thermal performance of automated split-controlled blinds. Build. Environ. 2012, 56, 127–138. [Google Scholar] [CrossRef]
Pandharipande, A.; Caicedo, D. Smart indoor lighting systems with luminaire-based sensing: A review of lighting control approaches. Energy Build. 2015, 104, 369–377. [Google Scholar] [CrossRef]
Tabadkani, A.; Roetzel, A.; Li, H.X.; Tsangrassoulis, A. A review of occupant-centric control strategies for adaptive facades. Autom. Constr. 2021, 122, 103464. [Google Scholar] [CrossRef]
Shen, E.; Hu, J.; Patel, M. Energy and visual comfort analysis of lighting and daylight control strategies. Build. Environ. 2014, 78, 155–170. [Google Scholar] [CrossRef]
Chaiwiwatworakul, P.; Chirarattananon, S.; Rakkwamsuk, P. Application of automated blind for daylighting in tropical region. Energy Convers. Manag. 2009, 50, 2927–2943. [Google Scholar] [CrossRef]
Han, X.; Malkawi, A. Model-Free Reinforcement Learning-Based Control for Radiant Floor Heating Systems. In Proceedings of the International Conference on Building Energy and Environment, Montréal, QC, Canada, 25–29 July 2022; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Wang, Z.; Hong, T. Reinforcement learning for building controls: The opportunities and challenges. Appl. Energy 2020, 269, 115036. [Google Scholar] [CrossRef]
Li, Z.; Han, X.; Wang, J.; Zuo, W. Reinforcement Learning to Enhance Optimal Operation of Resilient Community Energy Systems; National Renewable Energy Laboratory (NREL): Golden, CO, USA, 2024.
Mason, K.; Grijalva, S. A review of reinforcement learning for autonomous building energy management. Comput. Electr. Eng. 2019, 78, 300–312. [Google Scholar] [CrossRef]
Ding, X.; Du, W.; Cerpa, A.E. Mb2c: Model-based deep reinforcement learning for multi-zone building control. In Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Yokohoma, Japan, 18–20 November 2020. [Google Scholar]
Park, J.Y.; Dougherty, T.; Fritz, H.; Nagy, Z. LightLearn: An adaptive and occupant centered controller for lighting based on reinforcement learning. Build. Environ. 2019, 147, 397–414. [Google Scholar] [CrossRef]
Ding, X.; Du, W.; Cerpa, A. OCTOPUS: Deep reinforcement learning for holistic smart building control. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 13–14 November 2019. [Google Scholar]
Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
Sutton, R.S. Reinforcement learning: An introduction. In A Bradford Book; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
“ClimateStudio.” Solemma. Available online: https://www.solemma.com/climatestudio (accessed on 18 August 2025).
Rutten, D. Grasshopper: Algorithmic Modeling for Rhino; Robert McNeel & Associates: Seattle, WA, USA, 2007. [Google Scholar]
Committee, I.D.M.; IES Spatial Daylight Autonomy (sDA) and Annual Sunlight Exposure (ASE), Daylight Metrics Committee. Approved Method IES LM-83-12; Illuminating Engineering Society of North America: New York, NY, USA, 2012. [Google Scholar]
González, B.P. LAMAISON Hotel & Guesthouse/CBAG.Studio Architects [Photograph]. 2021. Available online: https://www.archdaily.com/961765/lamaison-hotel-and-guesthouse-cbatudio-architects (accessed on 17 November 2024).
Schubert, M.; Weber, J.; Lindhe, J. (Photographers). SDU Campus Kolding/Henning Larsen [Photograph]. 2015. Available online: https://www.archdaily.com/590576/sdu-campus-kolding-henning-larsen-architects (accessed on 17 November 2024).
Attia, S. Evaluation of adaptive facades: The case study of Al Bahr Towers in the UAE. QScience Connect 2017, 2017, 6. [Google Scholar]
Ward, G.J. The RADIANCE lighting simulation and rendering system. In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, Orlando, FL, USA, 24–29 July 1994. [Google Scholar]
Liu, Y.; Colburn, A.; Inanici, M. Deep neural network approach for annual luminance simulations. J. Build. Perform. Simul. 2020, 13, 532–554. [Google Scholar] [CrossRef]
Raynham, P. Book Review: The Lighting Handbook, 10th ed.; Reference and Application; SAGE Publications Sage: London, UK, 2012. [Google Scholar]
Mui, K.W.; Wong, L.T. Acceptable illumination levels for office occupants. Archit. Sci. Rev. 2006, 49, 116–119. [Google Scholar] [CrossRef]
Tong, L.; Liu, N.; Hu, S.; Lu, M.; Zheng, Y.; Ma, X. Research on the preferred illuminance in office environments based on EEG. Buildings 2023, 13, 467. [Google Scholar] [CrossRef]
Wienold, J.; Christoffersen, J. Evaluation methods and development of a new glare prediction model for daylight environments with the use of CCD cameras. Energy Build. 2006, 38, 743–757. [Google Scholar] [CrossRef]
Chaloeytoy, K.; Ichinose, M.; Chien, S.-C. Determination of the simplified daylight glare probability (DGPs) criteria for daylit office spaces in Thailand. Buildings 2020, 10, 180. [Google Scholar] [CrossRef]

Figure 1. Diagram of the RL-based control of kinetic responsive facades. Red-highlighted sensors measure horizontal illuminance from horizontal sensors and vertical illuminance from vertical sensors.

Figure 2. Angle of incidence of sunlight in Boston.

Figure 3. Workflow for the proposed RL control approach. The RL agent interacts with the simulation environment, which includes lighting simulation (Radiance) and parametric modeling (Grasshopper 2). A WebSocket-based platform facilitates data communication between the simulation environment and the RL agent. The RL framework consists of two neural networks: a baseline and a policy network. The agent receives state information, reward signals, and termination flags from the environment, updates the neural network weights, and sends control actions to optimize the lighting environment.

Figure 4. Virtual office model with target workspaces. The façade (window) is oriented toward the east. Desks closer to the façade (Northeast and Southeast) are categorized as near-facade desks, while those positioned further from the façade (Northwest and Southwest) are classified as far-facade desks. The diagram also highlights measurement planes, with pink dots representing vertical plane measurement points and blue dots representing horizontal plane measurement points.

Figure 5. Flowchart of the RBC for facades.

Figure 6. Reward, Baseline Loss, and Policy Loss of Three Facades on 21 June.

Figure 7. Reward, Baseline Loss, and Policy Loss of Three Facades on 21 September.

Figure 8. Hourly Horizontal Illuminance Levels for Three Facades on 21 June and 21 September. (The target illuminance range is highlighted in gray.) Points represent specific time points, and lines are connected by these points to illustrate trends.

Figure 9. Hourly Vertical Illuminance Levels for Three Facades on 21 June and 21 September. (The target illuminance range is highlighted in gray.) Points represent specific time points, and lines are connected by these points to illustrate trends.

Figure 10. Percentage of Planes Outside the Target Illuminance Range Across Different Shading Systems (Lower Indicates Better Performance).

Figure 11. DGP for Three Facades on 21 June and 21 September. (The target DGP range [0, 0.35] is highlighted in gray). The DGP values are calculated using vertical illuminance.

Table 1. Reflection factor of each room component material [43].

Room Components	Materials	Reflection Factor (%)
Ceiling	Ceiling LM83	70
Floor	Floor LM83	20
Wall	Wall LM83	50
Furniture	Furniture LM83	50
Facade	Opaque Roller Shade	43

Table 2. Real-world examples of kinetic facades and their models for simulation.

Building Name	LA MAISON Hotel [44]	Campus Kolding [45]	Al Bahr Towers [46]
Facade Picture
Location (latitude, longitude)	Saarlouis (49.31° N, 6.75° W)	Kolding (55.49° N, 9.47° W)	Dubai (25.27° N, 55.29° W)
Movement Type	Vertically Folding	Diagonally Folding	Radially Folding
Facade Model

Table 3. The motion characteristics, element size, and examples of the range of slat angles. The pictures show only 0°, 30°, 60°, and 90°, while the possible rotation angles range from 0° to 90° in 5° increments.

Movement Type	Facade Elements Size	0°	30°	60°	90°
Vertically Folding	1.79 m × 0.91 m
Diagonally Folding	0.60 m × 0.91 m
Radially Folding	1.79 m × 0.91 m

Table 4. Training Hyperparameters.

Hyperparameter	Value
Epoch	500
Learning Rate	0.001
Optimizer	Adam
γ	0.99
Number of Layers in Policy Network	6
Number of Layers in Baseline Network	5
Size of Hidden Layers	128

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Tian, J.; Ji, G.; Cheng, T.; Loftness, V.; Han, X. Reinforcement Learning-Enabled Adaptive Control for Climate-Responsive Kinetic Building Facades. Buildings 2025, 15, 2977. https://doi.org/10.3390/buildings15162977

AMA Style

Li Z, Tian J, Ji G, Cheng T, Loftness V, Han X. Reinforcement Learning-Enabled Adaptive Control for Climate-Responsive Kinetic Building Facades. Buildings. 2025; 15(16):2977. https://doi.org/10.3390/buildings15162977

Chicago/Turabian Style

Li, Zhuorui, Jinzhao Tian, Guanzhou Ji, Tiffany Cheng, Vivian Loftness, and Xu Han. 2025. "Reinforcement Learning-Enabled Adaptive Control for Climate-Responsive Kinetic Building Facades" Buildings 15, no. 16: 2977. https://doi.org/10.3390/buildings15162977

APA Style

Li, Z., Tian, J., Ji, G., Cheng, T., Loftness, V., & Han, X. (2025). Reinforcement Learning-Enabled Adaptive Control for Climate-Responsive Kinetic Building Facades. Buildings, 15(16), 2977. https://doi.org/10.3390/buildings15162977

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reinforcement Learning-Enabled Adaptive Control for Climate-Responsive Kinetic Building Facades

Abstract

1. Introduction

2. Methodology

2.1. Reinforcement Learning

2.2. RL-Based Control for Kinetic Responsive Facades

2.3. Simulation in a Virtual Environment

2.4. State-Action Space and Reward Calculation

3. Results

3.1. Reward

3.2. Policy Loss and Baseline Loss

3.3. Horizontal and Vertical Illuminance Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI